dag_draw(dag08)
Lesson 20: Worksheet
DAGS and simulations
Objectives
20.1 [Technical] Collect a sample from a DAG simulation.
20.2 [Technical] Examine the formulas behind a DAG simulation and compare to the results of a regression model trained on a sample from the DAG simulation.
20.3 [Conceptual] Recognize properties of a DAG. i. Identify exogenous nodes. ii. Identify all pathways between two specified end nodes. iii. On a given pathway, is there causal flow from one end node to another? iv. On a given pathway, is there a causal flow from some node on the pathway to both end nodes?
Part 1: Samples from DAGs
- Use
dag_draw()
to draw a picture of thedag08
directed acyclic graph. From this graph, explain why nodec
is exogenous and whyx
andy
are not.
- Use
print()
to view the formulas used bydag08
to simulate data. What about the formula fory
indicates that it’s receives inputs fromx
andc
.
- There are three coefficients in the formula for
y
: an intercept, anx
coefficient, and ac
coefficient. (There is also some random input from an exogenous source unrelated toc
orx
.) What are the numerical values of the three coefficients?
- Collect a sample of size \(n=100\) from
dag08
and use it to train the model with specificationy ~ x
. Do the coefficients reported match those you found in part (c)? (If you are not sure, use a bigger sample size, say \(n=1000\) or even bigger.)
The model says the x
coefficient is about 1.5, not the same as in the DAG formula for y
.
- Similar to (4), but use the specification
y ~ x + c
. How do the coefficients for this model compare to those you found in (3)?
Part 2: Paths in DAGs
In
dag08
there are two paths connectingx
andy
. One path is direct, \(X \longrightarrow Y\). The other path is indirect, \(X \longleftarrow C \longrightarrow Y\).- Along the indirect path, is there a causal flow from
x
toy
? - Along the indirect path, is there a causal flow from any node on the graph that reaches both endpoints,
x
andy
?
- Along the indirect path, is there a causal flow from
dag_school2
is a highly simplistic model of the relationship betweenexpenditure
s on schools and studentoutcome
s in terms of, say, standardized test scores.
dag_draw(dag_school2, vertex.label.cex=1, vertex.size=40)
There is a direct pathway from expenditure
to outcome
as well as another, indirect pathway.
- Are there any exogenous nodes in the graph?
- On the indirect pathway, is there a causal flow from
expenditure
tooutcome
? - Is there a causal flow from any node on the indirect pathway to both
expenditure
andoutcome
? Which one?
Part 3: Are expenditures good for school outcomes?
- Look at the formulas for
dag_school2
. Is a higherexpenditure
connected to ahigher
outcome?
- Generate a simple of size 1000 from
dag_school2
and use it to train the modeloutcome ~ expenditure
. Is the coefficient onexpenditure
consistent with what you found in (1)? (If you aren’t sure, use a larger sample size, say 10,000.) What about the coefficient onexpenditure
leads to your conclusion?
- Speculate on what might be the origin of the evident inconsistency between (1) and (2)?
Part 4: Constructing a DAG
In this task, you will construct DAGs using dag_make()
and draw them using dag_draw()
.
A DAG is defined by a series of tilde expressions, one for each node in the graph. The tilde expression for a node has the node’s name on the left-hand side of the tilde. The right-hand side contains the nodes which serve as inputs to the node named on the left-hand side. If there are no inputs, write exo()
.
For example, consider a DAG with three nodes: one
, two
, and three
. To define a DAG where node two
receives input from node one
, and node three
receives input from nodes one
and two
, use make_dag()
with three tilde expressions:
<- dag_make(
example_dag ~ exo(),
one ~ one,
two ~ two + one
three
)dag_draw(example_dag)
The right-hand side of a formula can be any arithmetic expression involving the node names, but we will keep it simple: just use +
to separated the node names. If a node receives no inputs, the right-hand side should be simply exo()
to mark that node as exogenous.
- What happens if node
one
, instead of being exogenous, takes as input one of the other two nodes inexample_dag
?
- Create and draw a DAG that has the same arrangement of causal connections as “Professor Butts and the Self-Operating Napkin,” illustrated below:
Professor Butts and the Self-Operating Napkin (1931). Soup_spoon
(A) is raised to mouth, pulling string
(B) and thereby jerking ladle
(C), which throws cracker
(D) past toucan
(E). Toucan jumps after cracker and perch
(F) tilts, upsetting seeds
(G) into pail
(H). Extra weight in pail pulls cord
(I), which opens and ignites lighter
(J), setting off skyrocket
(K), which causes sickle
(L) to cut string_m
(M), allowing pendulum with attached napkin to swing back and forth, thereby wiping_chin
.
Watch your spelling of node names! Use this command to draw your napkin_dag
:
dag_draw(napkin_dag, vertex.label.cex=.5, vertex.size=10, edge.arrow.size = 0.2)