1 Exercises: Customizing functions

Exercise 8. 1 Demonstrate linear regression as a mechanism for specifying a set of functions then determining values for their coefficients in a linear combination by matching as closely as possible patterns detected in the data.

File ID: seal-ring-scarf


Exercise 8. 2 In \(a x + b\), what are the functions that go into the linear combination. Same for \(a x^2 + b x + c\).

File ID: snail-hang-tv


Exercise 8. 3  

CautionNext status step: complete the draft

Assigned to DTK

Some of the tasks you worked on in high-school algebra were, formally, about finding values for coefficients in linear combinations. YOU HAVE ALREADY SEEN THIS in specific, usually made-up contexts. Point-slope or point-point form for a line. The roots for a quadratic.

File ID: aspen-love-hamper


Exercise 8. 4 Examples of multiplying osc by various envelopes.

Making a flat-topped hill by with hillside(hill()). Then multiply this by an osc(). Then have different “attack” and “decay” times by using a multiplication strategy: [hillside(t) - hillside(done - a t)]hill(t)osc(t)

File ID: horse-hide-ship


Exercise 8. 5 Constructing local functions (unlike square() or cube()) by multiplying hillsides together. This creates a function that levels off rather than running off to infinity.

File ID: calf-hear-bulb


Exercise 8. 6 Showing a few natural splines. They head off linearly instead of as x^2 or x^3 or higher order. Return

x <- seq(0, 1, length=51)
foo <- splines::ns(x, df=2)
plot(foo[,1])

plot(foo[,2])

foo <- splines::ns(x, df=3)
plot(foo[,2])

plot(foo[,1])

plot(foo[,3])

File ID: boy-forgive-pen


Exercise 8. 7 Examples of textbook formulas created by multiplication of simple terms. Ask the students to write down the input variables, the coefficients, and whether they are a case of \(x \times y\) or \(x \time x\) or if some other functions are involved, as in \(f(x) \times f(y)\).

For instance, gravitational acceleration of any particle is proportional to the mass of the attracting object and inversely proportional to the square of the distance between the particle and the attracting object. Traditionally such relationship knowledge has been described as a “law,” as in the “Universal Law of Gravitation,” the name for the relationship described in the previous sentence. In chemistry, the “Ideal Gas Law” holds that temperature is proportional to the number of gas molecules (\(n\)), the pressure of the gas (\(P\)), and the volume of the container (\(V\)): altogether that \(PV = n R T\) (where \(R\) is a known constant of proportionality: 8.314 J mol-1 K-1). Similarly, “Ohm’s Law” describing electrical current (I) and voltage (V) is that V = I R, where R is the resistance of the electrical conductor.

File ID: kangaroo-eat-oven


Exercise 8. 8 Use Boyle’s data as an example. Ask students to try out various ways of building a function that matches the shape of the data.

File ID: doe-give-table


Exercise 8. 9 Modeling magnitudes: the relationship between power, mass, and rpm of an engine. Allometrics in animals.

File ID: finger-talk-clock


Exercise 8. 10 An example from physics, engineering, and acoustics: a common problem is modeling an object’s response to a shock. Think of a car tire hitting a pot-hole, an electrical circuit responding to nearby lightning, the pluck of a guitar string, or the striking of a tuning fork by a small hammer. The response is an oscillation whose amplitude diminishes in time. An appropriate model is multiplying osc() multiplied by double() in this way:

\[\text{response}(t) \equiv \text{osc}(\omega\, t) \times \text{double}(-k\, t)\]

File ID: finch-sit-painting


Exercise 8. 11 [Turn this into an exercise. E.g. … On the internet, you can find a formula for the heat index

 HI <- function(T, H) {
-42.379 + 2.04901523*T + 10.14333127*H - .22475541*T*H -
    .00683783*T*T - .05481717*H*H + .00122874*T*T*H +
    .00085282*T*H*H - .00000199*T*T*H*H
}

The formula has a visually striking appearance. It is a linear combination of nine functions. The formula specifies each coefficient to many decimal places. Impressive!

Or, maybe not. The expert modeler can see what is going on here. The individual functions in the linear combination follow a pattern: T, H, TH, T2, H2, T2H, T H^2, T2 H2. Given that humidity cannot be negative and that the domain for temperature where the concept of heat index applies involves only positive temperatures, the functions T and T^2 are hardly different, and similarly for H and H^2. Likewise T H, T^2 H, T H^2, and T^2 H2 are all very similar.

Activities based on the stability of the function: what happens when you get a coefficient wrong, or if you leave out one of the functions. The coefficients depend hugely on the units used for temperature (but relative humidity is dimensionless). Demonstration: A simpler formula when T and H are centered on the domain.

File ID: goat-forget-hat


Exercise 8. 12 Chapter 7 described how the gaussian function is often used as a probability distribution function to represent an interval statement. But it might happen that a modeler prefers a distribution that is closer to uniform, like that in Fig E8. 1(b).

  1. Build a custom function that looks like Fig E8. 1(b). As a hint, here is a riddle: How do you build a hill out of two hillsides?
(a) Normal distribution
(b) Custom distribution
Figure E8. 1: Two possible relative probability functions to encode the idea of [5 to 15]
  1. Using linear combination of two functions, build a custom function that looks like Fig E8. 1(b). As a hint, here is a riddle: How do you build a hill out of two hillsides? We’ve scaffolded the problem: you just have to find appropriate values for the ??? in the input scaling, and perhaps change the sign on one of the coefficients of the linear combination.

Copy your function definition here.

  1. The hillside() function has a parameter sd that governs the speed of the transition from zero to one. Modify your function to make the output look more like a uniform distribution running from 5 to 15.

Copy your new function definition here.

File ID: tiger-build-stair


2 Enrichment:

  • Combining different sources of risk.
CautionNext status step: complete the draft

Assigned to DTK

NEED TO ADD SOME QUESTIONS for students. MAYBE : Factor x increases risk by 22% from a baseline of 10%, factor y by 44%. What would be the risk if both factors x and y are present.

Policy makers and medical workers often use risk to quantify the chances of an event such as a tsunami or vulnerability to a disease. Often, there are several contributors to a given risk, for example with diabetes age, weight, (lack of) physical activity, and so on. It’s sensible to combine these contributions together to find the overall risk.

Here’s a made up example relating to risk of a hypothetical disease. Suppose a person with no risk factors has a probability of 10% of getting the disease in the next five years. This is called the baseline risk. Suppose that each of the risk factors—age, obesity, sedentary—on its own triples the probability of getting the disease. This is called a risk ratio, and you will often see risk ratios reported for exposures to toxins or other hazards. To get the risk of the person with the risk factor, multiply the risk ratio times the baseline risk: \(3 \times 10% = 30%\). For example, a sedentary person who is neither old nor obese has a 30% chance of the disease.

Question: What is the probability of an old, sedentary, obese person getting the disease? Since each risk factor triples the chances of getting the disease, common sense may suggest that the risk for our old, sedentary, obese person will be \(3 \times 3 \times 3 \times 10% = 270%\). But no real probability can be larger than 100%.

The combination of risk factors was an unanswered riddle until about 1950. The accepted technique has come into common use only in the last several decades. The technique involves pipelining simple functions. The first step is to convert the baseline risk from a probability to another scaled called odds. The conversion is simple:

\[\text{convert_to_odds}(\text{prob}) \equiv \frac{\text{prob}}{1 - \text{prob}}\] There is similarly a function that will convert odds into the format of probability: \[\text{convert_to_prob}(\text{odds}) \equiv \frac{\text{odds}}{\text{odds} + 1}\]

or, in computer notation:

convert_to_odds <- function(prob) prob / (1 - prob)
convert_to_prob <- function(odds) odds/ (1 + odds)

In our example, the baseline odds is 0.1/0.9 = 0.1111.

The next step is to convert the risk ratio and baseline odds into an odds ratio. This is a pipeline involving convert_to_odds() and ratio()

top: risk ratio \(\rightarrow\) odds() \(\rightarrow\)                                               ratio(top, bottom) \(\rightarrow\) doublings() bottom: baseline risk \(\rightarrow\) odds() \(\rightarrow\)

For example, the probability of our obese-but-not-old-or-sedentary person getting the disease is 0.3. This corresponds to an odds of 0.3/0.7 = 0.43. The baseline odds is 0.1111. So the odds ratio associated with obesity is 0.43/0.1111 = 3.87. The odds ratio is often very close to the risk ratio: here, 3.87 compared to 3.

Next step … compute doublings() of the odds ratio. We can approximate this from what we know about double(): double(1) gives 2 and double(2) gives 4. So doublings(3.87) will be close to 2; the actual value is 1.95. This is called the log-odds-ratio.

The accepted rule for combining multiple risk ratios is to add together the log-odds-ratio for each risk factor. So the overall log-odds-ratio for our old, sedentary, obese person will be 1.95 + 1.95 + 1.95 = 5.85. Since the baseline odds are 0.1111, the disease log-odds for our triply burdened person will be 5.85 \(\times\) 0.1111 = 0.65.

Although risk calculations are done using odds and log-odds-ratios, it’s deemed better to use probability to communicate with policy-makers and patients. The computation is

0.65 |> double() |> convert_to_prob()
[1] 0.610769

Combining together three risk factors each of which has a risk ratio of 3 gives not the absurd 270% but only 61.1% risk of the disease.

Use income => performance and argue for a composition with doublings() when it comes to income.

File ID: combining-sources-of-risk


  • Graphics of functions with two inputs
CautionNext status step: complete the draft

Assigned to DTK

NEED TO ADD IN SOME TASKS FOR STUDENTS. Maybe choose the quadratic approximation that locally matches some part of the map or of a sketch_plot() function.

The surface of the Earth provides a nice example of a function. We’ll let location on the Earth be the input to the function, specifying position quantitatively with latitude and longitude. The output of the function will be the elevation above sea level. Maps are the means to display such functions, as in Fig E8. 2 which depicts a small island.

Figure E8. 2: 1958 US Geological Survey map of St. John in the US Virgin Islands in the Carribean Sea. Source

Hikers are familiar with “topographic maps” which indicate the slopes, hills, mountains and valleys of a geographic area. Fig E8. 2 contains a topographic map of an island in the Caribbean Sea. Like geographic maps generally, position on the map corresponds to latitude and longitude. Many of the marks on the map are special features of the place: houses, roads, bays, and such. Green ink marks vegetation.

Fig E8. 2 is such a topographical map. The topography is indicated by brown lines called contours. All the points on a given contour are at the same elevation. For an example, locate the relatively thick brown line that passes near the upper corner of the final “N” in the label “ST JOHN.” That brown line runs in an irregular circuit. All the points on that circuit are at 200 feet elevation. You can tell this only by following the contour until you come to a label. Nearby brown lines run nearly parallel to the 200 foot elevation lines. These nearby lines are at different elevations; adjacent lines are separated by 20 vertical feet of elevation.

If there was a hiking trail that followed a contour, the trail would be absolutely level: no uphills or downhills. The shoreline of the island (contour at zero elevation) is similarly a kind of absolutely level path around the island. On the other hand, where a road crosses a contour as at the road is leading from one elevation to another, that is, going uphill or downhill.

Geographic land forms can have complicated terrain. Over the centuries, cartographers (that is, map-makers) have adopted many graphical conventions to avoid inundating the human reader with that complexity. For instance, only a few of the contours are labelled. You need to count up or down from the labelled contours to figure out the elevation of the unlabelled contours. The map format is a compromise between helping you to see the overall pattern of the geography and letting you examine the details in a particular locale.

The sultriness function can be analogized to a map of terrain. Instead of the inputs being latitude and longitude, they are temperature and humidity. The output, sultriness, is measured in degrees, not the feet of elevation marked by the map of St. John. Fig E8. 3 shows shows the “topographical map” of the sultriness function. Since the word “topographic” refers specifically to elevation of the terrain, we call the displays of mathematical functions contour plots.

Figure E8. 3: A contour plot of the heat-index function. The function itself is shown by the black contours. Note that the temperature scale is inverted in order to match the plot to the format of the table in Figure 5.1.

The location on a contour plot indicates the values of the input quantities. The output of the function at any given location has to be inferred by the nearby contours.

The contour plot in Fig E8. 3 presents the same information as Figure 5.1. Each format has its own merits and demerits. In Section 2.6 we will see that the “steepness” of a function has important things to say. The contour-plot format makes it easier to see steepness at a glance.

Graphics are in many ways intuitive. But they are mostly limited to presentations that fit on a piece of paper or a computer display or, ultimately, the two-dimensional firm of our eye’s retina. In later chapters, we’ll develop non-graphical tools to deal with functions with more than two inputs. For the sake of developing intuition, we will introduce these tools in the context of functions with two inputs.

File ID: contour-plots-and-basis-functions


  • Piecewise smoothness

Exercise 8. 13 Babies, toddlers, children, adults, seniors, and the very elderly do not all die from the same causes. So why should there be a single smooth function. Divide mortality data into regimes that are each modeled by a simple function. Is there always a clear dividing point.

Show a death rate function based on mortality tables. Point out how all the details of reading the table can be hidden away in the function so that it has a very simple interface (API): mortality_rate(age, sex). Write it so that if sex is not specified, it uses the average of the male and female mortality rates.

File ID: beech-pack-stove


At any point, you can submit your answers by collecting them and uploading them to the class site.

No answers yet collected

Link to upload site

If requested by your instructor, please identify here the people from whom you received assistance on this assignment.

If the answers that have been loaded automatically are not yours, press this button before starting your work: