sex | width |
---|---|

G | 9.0 |

G | 9.0 |

G | 8.6 |

G | 9.3 |

B | 9.2 |

B | 9.7 |

B | 8.8 |

B | 9.3 |

# 28 Covariates

Dr. Mary Meyer is a statistics professor at Colorado State University. In 2006, she published an article recounting an episode from family life:

When my daughter was in fourth grade, I took her shopping for dress shoes. I was disappointed in the quality of girls’ shoes at every store in the mall. The shoes for boys were sturdy and had plenty of room in the toes. On the other hand, shoes for girls were flimsy, narrow, and had pointed toes. In spite of the better construction for boys, the costs of the shoes were similar! For children the same age, boys had shoes they could run around in, while girls’ shoes were clearly for style and not comfort.

Upon complaining about this state of affairs, I was told by sales representatives in two stores that boys actually had wider feet than girls, so needed wider shoes. Being very skeptical, I thought I would test this claim.

We will return to Dr. Meyer’s project in a little bit. However, for now, imagine how this situation might be addressed by someone who has yet to develop good statistical thinking skills. We will call this imagined protagonist “Mr. Shoebuyer.” Since the salesmen claimed that girls’ feet are narrower than boys, Mr. Shoebuyer heads out to measure the widths of girls’ and boys’ shoes.

A shoe store provides a convenient place to measure the widths of many different shoe styles. Mr. Shoebuyer gets to the shoe store, heads to the children’s section, and starts measuring. For each shoe on display, he records the shoe width and whether the shoe is for girls or boys. Here are his data:

Once back home, Mr. Shoebuyer uses his calculator to find the mean width of the shoes in each group. His results surprise him:

sex | mean width |
---|---|

Girls | 8.98 cm |

Boys | 9.25 cm |

Mr. Shoebuyer happens to be your uncle. He knows you are taking a statistics course and asks you to check his arithmetic. Putting on a statistical thinking hat to the effect size of sex on shoe width, you note the absence of a confidence interval. This omission is easy to fix.

`%>% lm(width ~ sex, data=.) %>% conf_interval() Shoebuyer_data `

term | .lwr | .coef | .upr |
---|---|---|---|

(Intercept) | 8.840 | 9.250 | 9.660 |

sexG | -0.848 | -0.275 | 0.298 |

Your uncle is at the table at Thanksgiving break. “Sorry, Uncle, but you don’t have nearly enough data to conclude that girls’ feet are narrower than boys’.” Translating the confidence interval into plus-or-minus format, you point out that the difference between the sexes is \(0.275 \pm 0.6\) cm. “You’ll need enough data to get that 0.6 margin of error down to something like 0.2 or lower.” You also point out that there might be a better place to collect data than a shoe store. “It’s the feet, not the shoes, that you want to measure.”

Aware of these pitfalls, Dr. Meyer worked with the third- and fourth-grade teachers at her daughter’s school to collect data. Being a statistical thinker, she thought about what data would illuminate the matter before carrying out the data collection. Her data, a sample of size \(n=39\), are recorded in the `KidsFeet`

data frame.

`lm(width ~ sex, data = KidsFeet) %>% conf_interval()`

term | .lwr | .coef | .upr |
---|---|---|---|

(Intercept) | 8.980 | 9.190 | 9.400 |

sexG | -0.713 | -0.406 | -0.099 |

In plus-or-minus format, this confidence interval is \(-0.4 \pm 0.3\) cm. Whatever the format, Dr. Meyer’s data provides some evidence that girls’ feet are narrower than boys’.

As a statistical thinker, Dr. Meyer knows that even though the foot width is the original quantity of interest, other factors might play a role in the system. For example, boys’ feet might trend longer or shorter than girls’ feet. This possibility should be taken into account by looking at the effect size of `sex`

on width, holding length constant. After all, a shoe buyer first tells the salesperson their foot length (or “size”); the salesperson then brings shoes of that size to try on.

`lm(width ~ sex + length, data=KidsFeet) %>% conf_interval()`

term | .lwr | .coef | .upr |
---|---|---|---|

(Intercept) | 1.1048182 | 3.6411683 | 6.1775184 |

sexG | -0.4947759 | -0.2325175 | 0.0297408 |

length | 0.1202348 | 0.2210250 | 0.3218151 |

Although `sex`

is the explanatory variable of primary interest to Dr. Meyer’s question, she knows to include other explanatory variables that might play a role. Such explanatory variables, not of direct interest, are called “**covariates**.” Dr. Meyer’s statistical expertise led her to consider possible covariates *before* collecting her data and took the trouble of measuring both foot length and width.

The confidence interval on the `sexG`

coefficient includes zero when `length`

is taken into account. Dr. Meyer’s little study provides evidence that even if girls’ shoes tend to be narrower than boys’, the feet inside them have about the same shape for both sexes.

## All other things being equal

The common phrase “all other things being equal” is a critical qualifier in describing relationships. To illustrate: A simple claim in economics is that a high price for a commodity reduces the demand. For example, increasing the price of heating fuel will reduce demand as people turn down thermostats to save money. Nevertheless, the claim can be considered obvious only with the qualifier *all other things being equal*. For instance, the fuel price might have increased because winter weather has increased the demand for heating compared to summer. Thus, higher prices may be associated with higher demand. Therefore, increased price may not be associated with lower demand unless holding other variables, such as weather conditions, constant.

In economics, the Latin equivalent of “all other things being equal” is sometimes used: “**ceteris paribus**”. The economics claim would be, “higher prices are associated with lower demand, *ceteris paribus*.”

Although the phrase “all other things being equal” has a logical simplicity, it is impractical to implement “all.” So instead of the blanket “all other things,” it is helpful to consider just “some other things” to be held constant, being explicit about what those things are. Other phrases along the same lines are “taking into account …” and “controlling for ….” Those additional variables that are to be considered are called “**covariates**.

*not*a synonym for “important.” See Lessons 36 through 38

## Letting things change as they will

Using covariates in models enables the relationship between a response and an explanatory variable to be described *ceteris paribus*, that is, “all other things being equal.” Another phrase used in news stories is “after adjusting for ….” This is appropriate since the *all* in “all other things” is, in reality, refers only to those particular factors used as the covariates in the model. Dr. Meyer’s foot width results might be stated in everyday language as, “After adjusting for foot length, she found no difference in the widths of girls’ and boys’ feet.”

Not including covariates in a model amounts to “letting other things change as they will.” In Latin, this is “*mutatis mutandis*.” In the foot-width example, the model `width ~ sex`

looks at the differences in foot width for the two sexes. However, sex is not the only thing associated with foot width. The model `width ~ sex`

ignores all other factors than sex; it compares boys and girls *mutatis mutandis*, that is, letting other things change as they will. In this case, comparing boys and girls involves not just the possible differences in foot width but also the differences in other factors such as foot length and body weight.