Chapter 19 used the rules associated with evanescent , that is, , to confirm our claims about the derivatives of many of the pattern-book functions. We will call these rules h-theory for short. This chapter will use h-theory to find algebraic rules to calculate the derivatives of linear combinations of functions, products of functions, and composition of functions. Remarkably, we can figure out these rules without specifying which functions are being combined. So the rules can be written abstractly using the pronouns and . Later, we will apply those rules to specific functions, to show how the rules are used in practical work.
Derivatives of the basic modeling functions
Opinions vary about whether it is really worthwhile for students to do extensive problem sets involving differentiating arbitrarily complex functions. But to become conversant with calculus, it helps to know at a glance the derivatives of a few, widely used functions. Section 19.3 introduced symbolic derivatives of each of the pattern-book functions. In general, the pattern-book functions have derivatives that are themselves either pattern-book functions or slight modifications of them. For instance, and . Most simply of all, .
The basic modeling functions are the same as the pattern-book functions, but with bare replaced by . In other words, each of the basic modeling functions is a composition of the corresponding pattern-book function with . The derivatives of the basic modeling functions are simple variations on the derivatives of the pattern-book functions. It’s easy to learn the rules and well worthwhile committing them to memory.
The “chain rule” provides the underlying logic but takes an especially easy form for the pattern-book functions. We can wait until Section 23.5 to take on the chain rule in general. In the meanwhile, examples will suffice to see how the chain rule plays out in the basic modeling functions.
An example of a basic modeling function is where is one of the pattern-book functions. Naturally, we could have used any input name, for instance but in the following we will stick to . And while sometimes we write as , othertimes we will write an equivalent form: or even .
We wish to find the derivative of the basic modeling function with respect to its argument, that is, . The answer is simple:
Or, writing the input scaling in another common form:
Actually, Math expression 23.1 and Math expression 23.2 apply to any function , not just the pattern-book functions. But for now we’re interested in the pattern-book functions.
Some examples illustrate the pattern.
The pattern-book functions and are important exceptions here. We’ve been writing them as and . This style of notation is important, rather than, say because and use a statistics convention that makes range from 0 to 1 and, correspondingly, have unit “area under the curve.”
The consequence is that the simple chain-rule pattern does not apply. Instead, the derivatives are:
- , as might be expected, but …
- .
Using the rules
When you encounter a function that you want to differentiate, you first have to examine the function to decide which rule you want to apply. In the following, we will to use the names and , but in practice the functions will often be basic modeling functions, for instance or , etc.
Step 1: Identify f() and g()
We will write the rules using two function pronouns, and , which can stand for any functions whatsoever. It is rare to see the product or the composition written explicitly as of . Instead, you are given something like . The first step in differentiating the product or composition is to identify what are and individually.
In general, and might be complicated functions, themselves involving linear combinations, products, and composition. But to get started, we will practice with cases where they are simple, pattern-book functions.
Step 2: Find f’() and g’()
For differentiating either products or compositions, you will need to identify both and (the first step) and then compute the derivatives and . That is, you will write down four functions.
Step 3: Apply the relevant rule
Recall from Chapter 9 that will will be working with three important forms for creating new functions out of existing functions:
- Linear combinations, e.g.
- Products of functions, e.g.
- Compositions of functions, e.g.
Differentiating linear combinations
Linear combination is one of the ways in which we make new functions from existing functions. As you recall, linear combination involves scaling functions and then adding the scaled functions as in : a linear combination of and . We can easily use -theory to show what is the result of differentiating a linear combination of functions. First, let’s figure out what is , Going back to writing as a slope function: In other words, if we know the derivative , we can easily find the derivative of . Notice that even though was used in the derivation, it appears nowhere in the result . The is solvent to get the paint on the wall and evaporates once its job is done.
Now consider the derivative of the sum of two functions, and :
Because of how can be “passed through” a linear combination, mathematicians say that differentiation is a linear operator. Consider this new fact about differentiation as a down payment on what will eventually become a complete theory telling us how to differentiate a product of two functions or the composition of two functions. We will lay out the -theory based algebra of this in the next two sections.
We can summarize the h-theory result for linear combinations this way:
The derivative of a linear combination is the linear combination of the derivatives.
That is:
as well as
The derivative of a polynomial is a polynomial of a lower order. We can demonstrate this:
Consider the polynomial The derivative is
Product rule for multiplied functions
The question at hand is how to compute the derivative . Of course, you can always use numerical differentiation. But let’s look at the problem from the point of view of symbolic differentiation. And since and are just pronoun functions, we will assume you are starting out already knowing the derivatives and .
This situation arises particularly when and are pattern-book functions for which you already have memorized and or are basic modeling functions whose derivatives you will memorize in Section @ref(basic-derivs).
The purpose of this section is to derive the formula for using , , and . This formula is called the product rule. The point of showing a derivation of the product rule is to let you see how the logic of evanescent plays a role. In practice, everyone simply memorizes the rule, which has a beautiful, symmetric form:
and is even prettier in Lagrange notation (where is written ):
As with all derivatives, the product rule is based on slope function (Section 18.3). Symbolic derivatives also invoke evanescent (Chapter 19).
We also need two other statements about and functions:
The derivative is the slope of of at input . Taking a step of size from will induce a change of output of , so
Any result of the form , where is finite, gives 0. More precisely, .
As before, we will put the standard disclaimer against dividing by until there are no such divisions at all, at which point we can safely use the equality .
Suppose the function , a product of the two functions and .
We will replace with its equivalent giving
appears in both terms in the numerator, once multiplied by and once by . Collecting those terms give:
This has two bracketed terms added together over a common denominator. Let’s split them into separate terms:
Notice that the second term has an both in the numerator and the denominator. -theory tells us that .
The first term is multiplied by the familiar form for the derivative of In each of the last two terms there is an involved. This is safely set to 1, since the implies that will not be exactly zero. There remain no divisions by so we can drop the in favor of :
The last step relies on statement (2) above.
Some people find it easier to read the rule in Lagrange shorthand, where and stand for and respectivly, and (“f-prime”) and (“g-prime”) stand for and .
Occasionally, mathematics gives us a situation where being more general produces simplicity.
In the case of function products, the generalization is from products of two functions to products of more than two functions, e.g. .
The chain rule here takes a form that makes the overall structure much clearer:
\end{eqnarray}
In the Lagrange shorthand, the pattern is even more evident:
As an example, consider the derivative of with respect to . Obviously, , a product of three simple functions:
Since this collapses to .
Chain rule for function composition
A function composition, as described in Section 9.2, involves inserting the output of one function (the “interior function”) as the input of the other function (the “exterior function”). As we so often do, we will be using pronouns a lot. A list might help keep things straight:
- There are two functions involved in a composition. Generically, we call them and . In the composition , the exterior function is and the interior function is .
- Each of the two functions and has an input. In our examples, we use to stand for the input to the exterior function and for the input to the interior function.
- As with all rules for differentiation, we will need to compute the derivatives of the functions involved, each with respect to its own input. So these will be and .
A reason to use different pronouns for the inputs to and is to remind us that the output is in general not the same kind of quantity as the input . In a function composition, the function will take the output as input. But since is not necessarily the same kind of thing as , why would we want to use the same name for the input to as we use for the input to .
With this distinction between the names of the inputs, we can be even more explicit about the composition, writing instead of . Had we used the pronound for the input to but our explicit statement, although technically correct, would be confusing: !
With all these pronouns in mind, here is the chain rule for the derivative :
Or, using the Lagrange prime notation, where stands for the derivative of a function with respect to its input, we have
The chain rule can be used in a clever way to find a formula for .
We’ve already seen that the logarithm is the inverse function to the exponential, and vice versa. That is: Since is the same function as , the derivative .
Let’s differentiate the second form using the chain rule: giving Whatever the function might be, it takes its input and produces as output the reciprocal of that input. In other words:
In news and policy discussions, you will often hear about “inflation rate” or “birth rate” or “interest rate” or “investment rate of return.” In each case, there is a function of time combined with a derivative of that function: with the general form In other words, the “rate” is the size of the change with time () divided by the size of the whole ( at that time.
- Inflation rate: The function is cost_of_living().
- Population growth rate: The function is population().
- Interest rate: The function is account_balance().
- Investment returns: The function is net_worth().
In all these cases, The “rate” is not merely “per time” as would be the case for . Instead the rate is “per unit of the whole per time.” For population growth rate, the “whole” is the population. Such rates involving people are often stated with the phrase is “per capita per year.” (The Latin “per capita” translates to “by head.” Its modern sense is “per unit of population.” Of course, the “unit of population” is a person.)
Notice the two uses of “per” in the phrase: “births per capita per year.” A proportional rate is two rates in one. Births per capita is a proportion of the population. Births per year is an average rate with respect to time. But “births per capita per year” is a rate in the proportion with respect to time.
The rate word “per” also appears as part of “percent,” which literally means “per hundred.” A “percentage change” is the amount of change divided by the base amount. Confusingly, perhaps, “percentage change” is often truncated to the shorter “percent.” This is the case with inflation rates, interest rates, and rates of return on investment. The interest rate on a credit-card debt is stated as a proportion of the current debt; all that is packed into the word “percent.” The interest rate itself is the “proportion of the current debt per year”: two rates in one.
Similarly for an inflation rate. “Inflation” is stated as the change in prices divided by the current price: a proportional change. “Inflation rate” is the proportional change per unit of time, where the “whole” is current prices and the rate is change in current prices per year divided by current prices.
Thanks to the chain rule, there is a shortcut way of writing proportional rates per time. Exactly equivalent to the ratio is
Derivatives of logarithms appear often in fields such as economics or finance, where it is common to consider the logarithm of the economic quantity to render changes as percent of the whole.
For instance, consider Figure 23.1 which shows the cumulative number of COVID during a period in 2020, early in the pandemic.
The two panels in Figure 23.1 show the same data about growing numbers of coronavirus cases, the left graph on linear axes, the right on the now-familiar semi-log axes.
Most people are excellent at comparing slopes, even if they find it difficult or tedious to quantify a slope with a number and units. For instance, a glance suffices to show that in the left graph, well through mid-March the red curve (Italy) is steeper on any given date than the blue curve (US). Correspondingly, the number of people with coronavirus was growing faster (per day) in Italy.
The right graph tells a different story: up until about March 1, the Italian cases were increasing faster than the US cases. Afterwards, the US sees a larger growth rate than Italy until, around March 19, the US growth rate is substantially larger than the Italy growth rate.
The previous two paragraphs and their corresponding graphs seem to contradict one another. But they are both accurate, truthful depictions of the same events. What’s different between the two graphs is that the left shows one kind of rate and the right shows another kind of rate. In the left, the slope is new-cases-per-day, the output of the derivative function
left graph: .
On the right, the slope is the proportional increase in cases per day, that is,
right graph: .
From the chain rule, we know that
Since the right graph is on semi-log axes, the slope we perceive visually is . That is an obscure-looking bunch of notation until the chain rule reveals it to be the rate of change in the number of covid cases at time divided by the number of cases at time .
The derivation of the chain rule relies on two closely related statements which are expressions of the idea that near any value a function can be expressed as a linear approximation with the slope equal to the derivative of the function :
- , which is the same thing as (1) but uses as the argument name and to stand for the small quantity we usually write with an .
We will now look at by writing down the fundamental definition of the derivative. This, of course, involves the disclaimer until we are sure that there is no division by involved.
Let’s examine closely the expression . Applying rule (1) above turns it into Now apply rule (2) but substituting in for and for , giving
We will substitute the and expression for the expression in giving In the denominator, appears twice and cancels itself out. That leaves a single term with an in the numerator and an in the denominator. Those ’s cancel out, at the same time obviating the need for and leaving us with the chain rule: