`<- lm(mpg_city ~ displacement, data=MPG) Mod `

# 24 Effect size

Regression modeling and confidence intervals provide a substantial toolbox to support statistical thinking. This Lesson starts to develop methods using modeling to inform decision-making. Decision-making takes many guises: whether to administer medicine, change a budget, raise or lower a price, respond to an evolving situation, and so on.

A useful simplification splits support for decision-making into two broad categories.

**Making a prediction**for an individual choice. The need for predictions arises in both mundane and critical settings. For instance, an airline needs to set prices. They want to maximize revenue. Higher prices will bring in more money per seat but may reduce the number of people flying. To make the pricing decision, the airline needs a prediction about what the demand will be for those seats, which may vary based on price, day of the week, time of day, time of year, origin and destination of the flight, and so on. Another example: Merchants and social media sites must choose what products or posts to display to a viewer. Merchants have many products, and social media has many news feeds, tweets, and competing blog entries. The people who manage these websites want to promote the products or postings most likely to cause a viewer to respond. To identify viable products or postings, the site managers construct predictive models based on earlier viewers’ choices. We will study prediction models in Lessons 25 and 26,**Intervening**in a system. Such interventions occur on both grand scales and small: changes in government policies such as funding for preschool education or subsidies for renewable energy, closing a road to redirect traffic or opening a new highway or bus line, changing the minimum wage, etc. Before making such interventions, it is wise to know what the consequences are likely to be. Figuring this out is often a matter of understanding how the system works: what causes what. As interventions often affect multiple individuals, influencing the overall trend of the effect across individuals might be the goal instead of predicting how each individual will be affected.

This Lesson focuses on “**effect size**,” a measure of how changing an explanatory variable will play out in the response variable. Built into the previous sentence is an assumption that the explanatory variable *causes* the response variable. In Lessons 28 through 31, we will look into ways to make responsible claims about whether a connection between variables is causal. Here, we will focus on the calculation and interpretation of effect size.

## Effect size: Input to output

An intervention changes something in the world. Some examples are the budget for a program, the dose of a medicine, or the fuel flow into an engine. The thing being changed is the *input*. In response, something else in the world changes, for instance, the reading ability of students, the patient’s serotonin levels (a neurotransmitter), or the power output from the engine. The thing that changes in response to the change in input is called the “output.”

“**Effect size**” describes the change in the output with respect to the change in the input. The simplest case is when the output is a quantitative variable. In this case, the change in the output is a difference between two numbers. The form of the effect size depends on the input type. For example, for a quantitative input, the effect size will be a *ratio*, that is, a rate. (For calculus students: the effect size is a derivative of the output with respect to the input.)

To measure an effect size from data, construct a model with the output as the response variable and the input as an explanatory variable.

Some decision variables are categorical. For instance, the buyer might like the idea of an engine that automatically turns off when the car is stopped at a light or in traffic. The `start_stop`

variable, which has categorical levels “Yes” and “No,” records whether the car has this feature. Effect size estimation is slightly different when the input is categorical rather than quantitative. Still, build a model and compare the change in output to the change in input:

```
<- lm(EPA_fuel_cost ~ start_stop, data=MPG)
Mod3 model_eval(Mod3, start_stop=c("No", "Yes"))
```

start_stop | .output | .lwr | .upr |
---|---|---|---|

No | 1872.193 | 916.0164 | 2828.369 |

Yes | 1945.194 | 989.0637 | 2901.324 |

In this case, the change in output is $73 per year; the change in input is “Yes” - “No.” But, of course, it is meaningless to subtract one categorical level from another. Consequently, the effect size of `start_stop`

on fuel cost cannot be quantified as a ratio. So, instead, the effect size is simply the difference in the output: a $73 per year increase with the Start/Stop feature.

The statistical thinker knows to pay attention to whether a calculated result makes sense. It seems unlikely that the Start/Stop feature causes more fuel to be consumed. Was there an error? Perhaps we did the subtraction backward? Check the report from `model_eval()`

to make sure.

Here, the problem is not arithmetic. However, there is another possibility. It might be that manufacturers include the Start/Stop feature with big cars but not little ones. Then, even if Start/Stop might save gas when everything else is held constant, because the big cars use more fuel than little cars, it only *appears* that Start/Stop hurts fuel economy. This theory is, at this point, speculation: a hypothesis. Such a mixture of effects—big versus small car mixed with availability of Start/Stop—is called “**confounding**.” In Lessons 28 through 30, we discuss identifying and dealing with possible confounding.

## Categorical outputs

Sometimes the relevant effect size involves a categorical output variable. A case in point is the possible confounding of the Start/Stop feature with vehicle size. To investigate this, we should build a model with Start/Stop as the output and vehicle size as the input.

In this case, the issue of whether vehicle size causes Start/Stop is not essential. We are not concerned with the decisions made by automobile designers so much as with the possible confounding.

When the output variable is categorical, it is not reasonable to calculate the change in output as the difference in categories. As before, “Yes” - “No” is not a number. Still, there is a meaningful and helpful way to quantify a change in a categorical output.

The essential insight is quantifying the change in output in terms of probabilities. For instance, a small effect size would reflect a slight chance of the output changing from one level to another.

The appropriate model type for a categorical output is to transform the output to a zero-one variable, as introduced in Lesson 19. We will present this in a demonstration here and return to the topic more fully in Lesson 34.

## Multiple explanatory variables

When a model has more than one explanatory variable, each has a different effect size.

As an example, consider the price of books. We have some data that might be informative, `moderndive::amazon_books`

. What is the effect size of page count on price. The appropriate model here is `list_price ~ num_pages`

. The effect size is easy to compute:

```
<- lm(list_price ~ num_pages, data = moderndive::amazon_books)
Mod1 model_eval(Mod1, num_pages = c(200, 400))
```

num_pages | .output | .lwr | .upr |
---|---|---|---|

200 | 15.82014 | -11.636987 | 43.27726 |

400 | 19.79643 | -7.637503 | 47.23037 |

We elected to compare 200-page books with 400-page books, simply because those seem like reasonable book lengths. However, the longer book costs about 4 dollars more. So the effect size, to judge from this model, is $4 divided by 200 more pages, which comes to 2 cents per page.

Another effect size is needed to address the question: Are hardcovers more expensive than paperbacks? The output is still price. But now, the input is categorical. In the `moderndive::amazon_books`

data frame, the variable `hard_paper`

has levels “P” and “H.” A possible model: `list_price ~ hard_paper`

.

```
<- lm(list_price ~ hard_paper, data = amazon_books)
Mod2 model_eval(Mod2, hard_paper = c("P", "H"))
```

hard_paper | .output | .lwr | .upr |
---|---|---|---|

P | 17.13523 | -10.62291 | 44.89338 |

H | 22.39393 | -5.46052 | 50.24839 |

A hardcover book costs about $5.25 more than a paperback book. Since the input is categorical, there is no change of input to divide by, so the effect size is $5.25 when going from a paperback to a hardcover.

We can look at the effects of page length and cover-type separately. Instead, we can include both as explanatory variables.

```
<- lm(list_price ~ hard_paper + num_pages, data = amazon_books)
Mod3 model_eval(Mod3, hard_paper = c("P", "H"), num_pages=c(200, 400))
```

hard_paper | num_pages | .output | .lwr | .upr |
---|---|---|---|---|

P | 200 | 14.52494 | -12.641928 | 41.69182 |

H | 200 | 19.48253 | -7.785720 | 46.75077 |

P | 400 | 18.43605 | -8.709404 | 45.58151 |

H | 400 | 23.39363 | -3.847698 | 50.63497 |

This output requires some interpretation. We have got short and long paperback books and short and long hardcover books. What should we compare to what?

The convention is to consider each of the two inputs separately and hold the other input constant when we compare.

*Effect size of num_pages on list_price*. To hold

`hard_paper`

constant, we will compare the two rows of the `model_eval()`

report that have a “P” value for `hard_paper`

. The difference in output for these two rows is $3.90. The effect size divides by the change in input—200 pages—so the effect size is just under 2 cents per page. *Effect size of*. This time we will hold

`hard_paper`

on `list_price`

`num_pages`

constant, say at 200 pages. Comparing the corresponding rows in the `model_eval()`

output shows a change in list price of $4.96 when going from paper back to hard cover. There is no special reason we decided to hold `hard_paper`

constant at “P” rather than “H” or hold `num_pages`

constant at 200 rather than 400. In general, the effect size will depend on the value being held constant. Choose a value that’s relevant to the purpose at hand.In these Lessons we are building models with additive effects.That is what the `+`

means in, say, `list_price ~ hard_paper + num_pages`

. We do this to keep the effect-size story as simple as possible. (Occasionally, you will see examples with *multiplicative* effects, called “**interactions**.” The tilde expressions for such models involve `*`

rather than `+`

, as in `list_price ~ hard_paper + num_pages.`

## Confidence intervals

Statistical thinkers know that any estimate they make, including estimates of effect sizes, involves sampling variation. Consequently, in reporting an effect size, always give an *interval* estimate: the confidence interval.

These Lessons usually involve model specifications that are linear, for example `y ~ x + a`

. For such models, the effect size with respect to each variable is identical to the regression coefficient for that variable. Consequently, the confidence interval on the coefficient is also the confidence interval on the effect size.

The confidence interval communicates to the decision-maker the uncertainty in the estimated quantity. Sophisticated decision-makers keep this uncertainty in mind, considering the range of outcomes likely from whatever use they make of effect size. For example suppose you have an effect size such as \(32 \pm 15\). In considering possible decisions, keep in mind the *entire interval*, not just the point estimate 32. Suppose the effect size 32 would lead you to decision \(\mathbb{A}\). If you would make a different decision \(\mathbb{B}\) for an effect size of, say, 20, then the precision of the effect size doesn’t enable you to distinguish meaningfully between decisions \(\mathbb{A}\) and \(\mathbb{B}\).

A particularly common and important situation involves deciding whether there is evidence to support any relationship at all between an explanatory variable and the response. You can think of this as deciding whether you have detected from your data whether a relationship exists. Whenever the confidence interval on the effect size with respect to that variable includes zero, a plausible conclusion is that there is no relationship between that variable and the response variable.

Statistically naive decision makers—even highly educated decision-makers can be statistically naive—look at the interval and sometimes ask the modeler, “Just give me a number. I don’t know what to do with two numbers.” Such a request might elicit a frank response: “If you don’t know what to do with two numbers, you also won’t know what to do with one number.” Unfortunately, that kind of frankness is not often well received; a reasonable alternative is: “The interval indicates the amount of uncertainty in the result. We’ll need to collect more data if you want to reduce the uncertainty.”

You can even estimate how much more data would be needed. Suppose the confidence interval were \(12 \pm 20\) estimated from a sample of size \(n=25\). Since this interval includes zero, it does not point definitively to the existence of a relationship. But the margin of error, 20 in this example, scales as \(1/\sqrt{n}\). If you make \(n\) bigger, you can expect the margin of error to become smaller: more data means better precision! How much better? If you quadruple \(n\), the margin of error will be about half as big. So \(n=100\) will give a margin of error about half the size of \(n=25\). In other words, with \(n=100\) the margin of error would be about 10.

Keep in mind, however, this paradox. Although we know that a sample size of \(n=100\) will produce a margin of error half the size as that from a sample size of \(n=25\), we **cannot** expect the point estimate (12, in this example) to remain at that value in the larger sample. All we know is that in the larger sample the point estimate will plausibly be *somewhere* in the interval \(12 \pm 20\). So plausible results from the larger sample might be \(-8 \pm 10\) or \(32 \pm 10\) or anywhere in between. Even though we can estimate the size of the margin of error for the larger sample, to know the overall result from the larger sample, you have to collect that sample!