%>% summarize(vh = var(height)) Galton
vh |
---|
12.8 |
Daniel Kaplan
March 7, 2023
Interpretation: The heights of the people in the Galton
data frame vary. The amount of this variability is the variance: 12.8 square-inches. In less strange units, the standard deviation is \(\sqrt{12.8\ \text{square-inches}} = 3.6\) inches
The most common action for the rest of this course will be to create a model and summarize it.
Example: lm(height ~ mother + father, data=Galton)
height ~ mother + father
is a tilde expression that specifies the roles of variables in the model. height
is the response variable. mother
and father
are the explanatory variables.data = Galton
tells lm()
to use the Galton
data frame to construct the model corresponding to the tilde expression.Example: Summarizing functions R2()
and conf_interval()
n | k | Rsquared | F | adjR2 | p | df.num | df.denom |
---|---|---|---|---|---|---|---|
898 | 2 | 0.109 | 54.7 | 0.107 | 0 | 2 | 895 |
Interpretation: mother
and father
jointly explain about 10% of the variance in the height
of their adult children.
Interpretation: The model equation for a person’s height in inches is, according to this model:
\[\text{person's height} = 22.3 + 0.283\ \mathtt{mother} + 0.380\ \mathtt{father}\]
lm()
) quantify how to account-for/explain the variation in the response variable in terms of the variation in the explanatory variables.Account-for/explain. Often when we use the word “explain” we mean to suggest a causal connection. For instance, this randomized clinical trial established that a particular blood-pressure drug leads to lower blood pressure, that is, it causes the blood pressure to go down.
When we say that “A causes B,” we don’t necessarily mean that A is the complete and total explanation for B. More often, we mean that “A contributes in some way to the value of B.” For instance, “high blood pressure increases mortality” does not mean that high blood pressure is the sole determinant of mortality. Instead, it means that high blood pressure contributes to an increased risk of mortality.
A DAG (Directed acyclic graph—unnecessarily intimidating name!) is a way of encoding a hypothesis of what causes what in a system. We discussed the system involving treating a battlefield casualty with a tourniquet. (Link to in-class activity.) The system—a “system” is a collection of components—involved USE of a tourniquet, SEVERITY of injury, staying alive long enough for ADMISSION to hospital, and post-hospital SURVIVAL. Common sense suggests some causal connections:
Other link were more hypothetical:
A DAG describes the hypothesized causal links among all the system components.
Use of sample()
and dag_draw()
with DAGs.
What is a “random trial”
How (and why) to automate replication of random trials.
You can learn these things from the text and the worksheet for Lesson 20.