Instructor Teaching Notes for Lesson 30

Math300Z

Author

Daniel Kaplan

Published

March 8, 2023

Aside: Confidence bands and global warming.

Cherry blossoms from Kyoto over 1200 years.

Smoking and cancer: a 1950s controversy

Deniers of a smoking/cancer link claimed there was a common cause for both: a “cancer gene.”

Smoking Smoking Lung cancer Lung cancer Smoking->Lung cancer Smoking gene Smoking gene Smoking gene->Smoking Smoking gene->Lung cancer

The gene had not been identified, so no data could be collected on it.

This is an example of confounding: the effects of the (supposed) gene and of smoking are mixed together.

Discovering the rules for small DAGs

Class activity

More complex DAGs

In considering the relationship between two nodes, enumerate each of the paths that connect the two nodes.

Example: Smoking with a non-genetic mediator: Tar

There are two paths from Tar to Lung cancer:

Smoking gene Smoking gene Lung cancer Lung cancer Smoking gene->Lung cancer Smoking Smoking Smoking gene->Smoking Path 2 Tar Tar Smoking->Tar Tar->Lung cancer Path 1

Two types of path between two endpoint nodes:

  1. A correlating path: Starting from some node on the path, causal influence can flow (along the arrows) to both endpoints.

BLOCK a correlating path by using some node along it as a covariate. Otherwise, it’s open.

  1. A colliding path: There’s no node on the path from which causal influence can flow (along the arrows) to both endpoints.

OPEN a colliding path by using the collider as a covariate. Otherwise, it’s closed.

one <- dag_make(
  A ~ exo(),
  B ~ A + exo(),
  D ~ A + exo(), 
  C ~ B + D +exo()
)
dag_draw(one)

Samp <- sample(one, size=1000)
lm(D ~ A, data=Samp) |> conf_interval()
# A tibble: 2 × 4
  term           .lwr  .coef   .upr
  <chr>         <dbl>  <dbl>  <dbl>
1 (Intercept) -0.0346 0.0286 0.0919
2 A            0.962  1.02   1.09  
lm(D ~ A + B, data=Samp) |> conf_interval()
# A tibble: 3 × 4
  term           .lwr    .coef   .upr
  <chr>         <dbl>    <dbl>  <dbl>
1 (Intercept) -0.0346  0.0286  0.0919
2 A            0.941   1.03    1.12  
3 B           -0.0702 -0.00623 0.0578
lm(D ~ A + C, data=Samp) |> conf_interval()
# A tibble: 3 × 4
  term           .lwr  .coef   .upr
  <chr>         <dbl>  <dbl>  <dbl>
1 (Intercept) -0.0321 0.0189 0.0699
2 A            0.252  0.329  0.406 
3 C            0.312  0.341  0.370 
lm(D ~ A + B + C, data=Samp) |> conf_interval()
# A tibble: 4 × 4
  term           .lwr   .coef    .upr
  <chr>         <dbl>   <dbl>   <dbl>
1 (Intercept) -0.0285  0.0149  0.0583
2 A            0.440   0.508   0.576 
3 B           -0.589  -0.535  -0.481 
4 C            0.484   0.514   0.544 
one <- dag_make(
  A ~ exo(),
  E ~ 10*C + exo(),
  B ~ A + exo(),
  D ~ A + exo(), 
  C ~ B + D +exo()
)
dag_draw(one)

Samp <- sample(one, size=1000)
lm(D ~ A, data=Samp) |> conf_interval()
# A tibble: 2 × 4
  term           .lwr   .coef   .upr
  <chr>         <dbl>   <dbl>  <dbl>
1 (Intercept) -0.0539 0.00950 0.0729
2 A            0.902  0.963   1.02  
lm(D ~ A + C, data=Samp) |> conf_interval()
# A tibble: 3 × 4
  term           .lwr    .coef   .upr
  <chr>         <dbl>    <dbl>  <dbl>
1 (Intercept) -0.0559 -0.00416 0.0476
2 A            0.157   0.238   0.319 
3 C            0.323   0.354   0.385 
lm(D ~ A + E, data=Samp) |> conf_interval()
# A tibble: 3 × 4
  term           .lwr    .coef   .upr
  <chr>         <dbl>    <dbl>  <dbl>
1 (Intercept) -0.0564 -0.00453 0.0473
2 A            0.162   0.243   0.324 
3 E            0.0321  0.0352  0.0384

Can Tar be used to avoid the confounding due to genetics? How do you block the back-door pathway?






## The Berkeley graduate admissions data from 1973

::: {.cell}

```{.r .cell-code}
mod1 <- model_train(zero_one(admit, one="admitted") ~ gender,
                    data=UCB_applicants)
model_plot(mod1, x=dept, color=gender, nlevels=10) +
  ylab("Admitted")

:::

mod2 <- model_train(zero_one(admit, one="admitted") ~ gender*dept,
                    data=UCB_applicants)
model_plot(mod2, x=dept, color=gender, nlevels=10, data_alpha=0.1) +
  ylab("Admitted")

model_train(zero_one(admit, one="admitted") ~ gender, 
            data=UCB_applicants) |> conf_interval()
Waiting for profiling to be done...
# A tibble: 2 × 4
  term          .lwr  .coef   .upr
  <chr>        <dbl>  <dbl>  <dbl>
1 (Intercept) -0.931 -0.830 -0.732
2 gendermale   0.485  0.610  0.736
model_train(zero_one(admit, one="admitted") ~ gender + dept, 
            data=UCB_applicants) |> conf_interval()
Waiting for profiling to be done...
# A tibble: 7 × 4
  term          .lwr   .coef    .upr
  <chr>        <dbl>   <dbl>   <dbl>
1 (Intercept)  0.488  0.682   0.877 
2 gendermale  -0.259 -0.0999  0.0582
3 deptB       -0.258 -0.0434  0.172 
4 deptC       -1.47  -1.26   -1.05  
5 deptD       -1.50  -1.29   -1.09  
6 deptE       -1.99  -1.74   -1.49  
7 deptF       -3.65  -3.31   -2.98  

Back to Berkeley

Should we adjust for department? Let’s go to a DAG.

UCB_dag1 <- dag_make(sex ~ exo(),
                     dept ~ sex,
                     admit ~ sex + dept)
dag_draw(UCB_dag1, vertex.label.cex=1)

If we think that the connection sex \(\longrightarrow\) department is just a matter of personal choice (as in the 1975 Science article), then we should block the back-door pathway.

But if we think that sex \(\longrightarrow\) department reflects systemic issues such as which departments are considered important and get funding, or which careers women think they can succeed in, then we do not want to block the backdoor pathway.

UCB_dag2 <- dag_make(sex ~ exo(),
                     success ~ sex,
                     dept_funding ~ sex,
                     dept ~ success,
                     admit ~ sex + dept + dept_funding)
dag_draw(UCB_dag2, vertex.label.cex=1)

Birthweight collider

Observations from the 1960s:

  • Smoking is associated with lower birthweight
  • Lower birthweight is associated with increased mortality

Question: Does smoking have a direct effect on mortality?

H Smoking Smoking Birth weight Birth weight Smoking->Birth weight Mortality of infant Mortality of infant Smoking->Mortality of infant ? Birth weight->Mortality of infant

How do you look at the direct effect of smoking on mortality? Block the other pathway by using birth weight as a covariate.

When this was done, by looking only at low-birthweight babies, it was found that smoking reduces mortality.

Might there be something else going on? Is there another cause for low birthweight?

J Birth defect Birth defect Birth weight Birth weight Birth defect->Birth weight Mortality of infant Mortality of infant Birth defect->Mortality of infant Smoking Smoking Smoking->Birth weight Smoking->Mortality of infant Birth weight->Mortality of infant