<- makeFun(0.5*S + 0.5*T + 0.5*S*T - 0.5*S^2 ~ S & T)
P contour_plot(P(S, T) ~ S & T, bounds(S=0:1, T=0:1))
26 Local approximations
We have now encountered three concepts in Calculus that will prove a great help in building models.
- Local functions. Chapter 5 looks at two functions that describe local behavior: the sigmoid and its derivative, the gaussian. In fact, any function can be used as a local function, so long as the modeler considers only a small part of the domain.
- Rates of change. Often, information about a system comes in the form of how the output from the system changes with respect to the input.
- Partial derivatives. We can greatly simplify a complex system by looking at the output with respect to only one of the inputs at a time. In science, experiments are often arranged to accomplish this: the output is monitored while one input is changed and the others forced to be constant.
Consider the problem of studying how plant growth is influenced by available water. It won’t be meaningful to compare tropical rain forests with high-latitude boreal forests. The availability of water is only one of many ways in which they differ. In a plant-growth experiment, one would hold the plant species and light exposure constant, and vary the water between specimens. In the language of partial differentiation, the result will be the partial derivative of growth rate with respect to water.
26.1 Thinking partially
The expression for a general low-order polynomial in two inputs can be daunting to think about all at once: \[g(x, y) \equiv a_0 + a_x x + a_y y + a_{xy} x y + a_{xx} x^2 + a_{yy} y^2\] As with many complicated settings, a good approach can be to split things up into simpler pieces. With a low-order polynomial, one such splitting up involves partial derivatives. There are six potentially non-zero partial derivatives for a low-order polynomial, of which two are the same; so only five quantities to consider.
- \(\partial_x g(x,y) = a_x + a_{xy}y + 2 a_{xx} x\)
- \(\partial_y g(x,y) = a_y + a_{xy}x + 2 a_{yy} y\)
- \(\partial_{xy} g(x,y) = \partial_{yx} g(x,y) = a_{xy}\). These are the so-called mixed partial derivatives. It does not matter whether you differentiate by \(x\) first or by \(y\) first. The result will always be the same for any smooth function.
- \(\partial_{xx} g(x,y) = 2 a_{xx}\)
- \(\partial_{yy} g(x,y) = 2 a_{yy}\)
The above list states neutral mathematical facts that apply generally to any low-order polynomial whatsoever.1 Those facts, however, shape a way of asking questions of yourself that can help you shape the model of a given phenomenon based on what you already know about how things work.
To illustrate, consider the situation of modeling the effect of study \(S\) and of tutoring \(T\) (a.k.a. office hours, extended instruction) on performance \(P(S,T)\) on an exam. In the spirit of partial derivatives, we will assume that all other factors (student aptitude, workload, etc.) are held constant.
To start, pick fiducial values for \(S\) and \(T\) to define the local domain for the model. Since \(S=0\) and \(T=0\) are easy to envision, we will use those for the fiducial values.
Next, ask five questions, in this order, about the system being modeled.
- Does performance increase with study time? Don’t over-think this. Remember that the approximation is around a fiducial point. Here, a reasonable answer is, “yes.” we will take\(\partial_S P(S, T) > 0\) to imply that \(a_S > 0\). This is appropriate because close to the fiducial point, the other contributors to \(\partial_S P(S, T)\), namely \(a_{ST}T + 2 a_{SS} S\) will be vanishingly small.
- Does performance increase with time spent being tutored? Again, don’t over-think this. Don’t worry (yet) that your social life is collapsing because of the time spent studying and being tutored, and the consequent emotional depression will cause you to fail the exam. We are building a model here and the heuristic being used is to consider factors in isolation. Since (as we expect you will agree) \(\partial_T P(S, T) > 0\), we have that \(a_T > 0\).
Now the questions get a little bit harder and will exercise your calculus-intuition since you will have to think about changes in the rates of change.
- This question has to do with the mixed partial derivative, which we’ve written variously as \(\partial_{ST} P(S,T)\) or \(\partial_{TS} P(S,T)\) and which it might be better to think about as \(\partial_S \left[\partial_T P(S,T) \right]\) or \(\partial_T \left[\partial S P(S,T)\right]\). Although these are mathematically equal, often your intuition will favor one form or the other. Recall that we are working on the premise that \(\partial_S P(S,T) > 0\), or, in other words, study will help you do better on the exam. Now for \(\partial_T \left[\partial S P(S,T)\right]\). This is a the matter of whether some tutoring will make your study more effective. Let’s say yes here, since tutoring can help you overcome a misconception that is a roadblock to effective study. So \(\partial_{TS} P(S,T) > 0\) which implies \(a_{ST} > 0\).
The other way round, \(\partial_S \left[\partial_T P(S,T) \right]\) is a matter of whether increasing study will enhance the positive effect of tutoring. We will say yes here again, because a better knowledge of the material from studying will help you follow what the tutor is saying and doing. From pure mathematics, we already know that the two forms of mixed partials are equivalent, but to the human mind they sometimes (and incorrectly) appear to be different in some subtle, ineffable way.
In some modeling contexts, there might be no clear answer to the question of \(\partial_{xy}\, g(x,y)\). That is also a useful result, since it tells us that the \(a_{xy}\) term may not be important to understanding that system.
On to the question of \(\partial_{SS} P(S,T)\), that is, whether \(a_{SS}\) is positive, negative, or negligible. We know that \(a_{SS} S^2\) will be small whenever \(S\) is small, so this is our opportunity to think about bigger \(S\). So does the impact of a unit of additional study increase or decrease the more you study? One point of view is that there is some moment when “it all comes together” and you understand the topic well. But after that epiphany, more study might not accomplish as much as before the epiphany. Another bit of experience is that “cramming” is not an effective study strategy. And then there is your social life … So let’s say, provisionally, that there is an argmax to study, beyond which point you’re not helping yourself. This means that \(a_{SS} < 0\).
Finally, consider \(\partial_{TT} P(S, T)\). Reasonable people might disagree here, which is itself a reason to suspect that \(a_{TT}\) is negligible.
Answering these questions does not provide a numerical value for the coefficients on the low-order polynomial, and says nothing at all about \(a_0\), since all the questions are about change.
Another step forward in extracting what you know about the system you are modeling is to construct the polynomial informed by questions 1 through 5. Since you don’t know the numerical values for the coefficients, this might seem impossible. But there is a another modeler’s trick that might help.
Let’s imagine that the domain of both \(S\) and \(T\) or the interval zero to one. This is not to say that we think one hour of study is the most possible but simply to defer the question of what are appropriate units for \(S\) and \(T\). Very much in this spirit, for the coefficients we will use \(+0.5\) when are previous answers indicated that the coefficient should be greater than zero, \(-0.5\) when the answers pointed to a negative coefficient, and zero if we don’t know. Using this technique, here is the model, which mainly serves as a basis for checking whether our previous answers are in line with our broader intuition before we move on quantitatively.
Notice that for small values of \(T\), the horizontal spacing between adjacent contours is large. That is, it takes a lot of study to improve performance a little. At large values of \(T\) the horizontal spacing between contours is smaller.
26.2 Finding coefficients from data
Low-order polynomials are often used for constructing functions from data. In this section, I’ll demonstrate briefly how this can be done. The full theory will be introduced in Block 5 of this text.
The data I’ll use for the demonstration is a set of physical measurements of height, weight, abdominal circumference, etc. on 252 human subjects. These are contained in the Body_fat
data frame, shown below.
One of the variables records the body-fat percentage, that is, the fraction of the body’s mass that is fat. This is thought to be an indicator of fitness and health, but it is extremely hard to measure and involves weighing the person when they are fully submerged in water. This difficulty motivates the development of a method to approximation body-fat percentage from other, easier to make measurements such as height, weight, and so on.
For the purpose of this demonstration, we will build a local polynomial model of body-fat percentage as a function of height (in inches) and weight (in pounds).
The polynomial we choose will omit the quadratic terms. It will contain the constant, linear, and interaction terms only. That is \[\text{body.fat}(h, w) \equiv c_0 + c_h h + c_w w + c_{hw} h w\] The process of finding the best coefficients in the polynomial is called linear regression. Without going into the details, we will use linear regression to build the body-fat model and then display the model function as a contour plot.
<- lm(bodyfat ~ height + weight + height*weight,
mod data = Body_fat)
<- makeFun(mod)
body_fat_fun contour_plot(body_fat_fun(height, weight) ~ height + weight,
bounds(weight=c(100, 250), height = c(60, 80))) %>%
gf_labs(title = "Body fat percentage")
Block 3 looks at linear regression in more detail.
Note that any other derivative you construct, for instance \(\partial_{xxy} g(x,y)\) must always be zero.↩︎