Causal Inference & DAGs

3/11/25

This workshop is plagiarism!!

  • almost all of this content comes from Statistical Rethinking, a textbook and online course that is completely free and available by Richard McElreath
  • a good portion also comes from The Book of Why and other works by Judea Pearl
  • there are many scholars, ecologists and otherwise who use this method and explain it better than I ever will - resources at the end

Let’s not panic

  • the beauty of causal inference is that it relies on concepts that come very naturally to the human brain and is founded on using the expert scientific knowledge that every scientist brings to their studies

  • THIS DOES NOT CHANGE EVERYTHING - just gives you a framework to easily express what you already feel and know

What is causal inference?

  • the study of causes and effects: does X cause a change in Y?

  • is this different from correlation?

Causality by Judea Pearl

Why don’t we talk or learn about causation?

  • Pearson & Galton, founders of modern statistics, failed in creating the tools needed for causal inference and subsequently decided that it was impossible and “unscientific”

    • they used their enormous influence to teach generations of scientists this and attack anyone who opposed them
  • Judea Pearl invented the math required to answer causal questions only ~ 40 years ago! Science is slow!

  • causation is not controversial - we are just transitioning

What is causal inference NOT?

  • prediction!! forecasting!!
  • if we want to use our models to estimate data in places or times that we do not have data for, but we DO NOT CARE about the relationships between the things in our model, that is prediction and not causal inference

  • prediction is cool!! it is however, not what we do in our lab (for now…)

  • AIC is a tool for measuring the predictive power of your model - it is not appropriate for our purposes

Who uses causal inference?

When do people use causal inference?

Level 1: association

  • how are the variables related? how does changing X shift my belief in Y?

  • example: what does canopy cover tell us about air temperature?

Judea Pearl’s Book of Why

When do people use causal inference?

Level 2: intervention

  • what would Y be if i do X?

  • example: how will bird species richness increase if i move from a park to a backyard?

Judea Pearl’s Book of Why

When do people use causal inference?

Level 3: counterfactuals

  • what if X had not occurred? is it X that caused Y?

  • example: would survey respondents prefer different green space features if they lived in a different borough?

Judea Pearl’s Book of Why

How do I do causal inference?

  • DAGs (directed acyclic graphs)!

  • arrows indicate a causal relationship from one variable to another

  • use your expert knowledge + literature to outline your system with your hypotheses and assumptions (you already make assumptions now, you just don’t visualize them!)

  • adjust your statistical test (e.g., model) using your DAG

Why do DAGs matter?

  • putting everything in your model does not test the relationship(s) you are interested in
  • complex systems have confounders that mislead us and that we need to adjust for
  • adjustments are dependent on our DAG and the variable of interest

Statistical Rethinking, Lecture 5

Confounders: fork

  • Z is a common cause of both X and Y

  • X and Y are associated

  • Once stratified by Z, X and Y have no association

Statistical Rethinking, Lecture 5

EXAMPLE: the effect of canopy cover on temperature (simplified)

library(ggdag)

fork_dag <- dagify(
  temp ~ canopy + SVF,
  canopy ~ SVF,
  labels = c(
    "temp" = "Temperature",
    "canopy" = "Canopy Cover",
    "SVF" = "Sky View Factor"
  ),
  coords = list(x = c(canopy = -1, SVF = 0, temp = 1), 
                y = c(canopy = 1, SVF = 0, temp = 1)),
  exposure = "canopy",
  outcome = "temp"
)

ggdag(fork_dag, text = FALSE, use_labels = "label") + theme_dag()

  • how do we know how to adjust our model?
ggdag_adjustment_set(fork_dag, effect = "direct", text = FALSE, use_labels = "label", shadow = TRUE) + 
  theme_dag()

  • our DAG shows us that if we want to understand the effect of canopy cover on temperature, we need to add sky view factor to our model (“adjust for sky view factor”)
  • our model may look something like this:
library(lme4)

temp_model <- lmer(temperature ~ canopy + SVF + (1|date),
                   data = temp_df)

Confounders: pipe

  • what if we want to use the same example but understand the effect of SVF on temperature?
  • this is now a different type of collider, a pipe!
  • X and Y are associated

  • influence of X on Y is transmitted through Z

  • Once stratified by Z, X and Y have no association

Statistical Rethinking, Lecture 5

  • the effect of SVF on temperature is both a direct effect and an effect that pipes through canopy
  • if we want to understand the total effect of sky view factor on temperature, we DO NOT add canopy to our models
ggdag_adjustment_set(pipe_dag, effect = "total", text = FALSE, use_labels = "label", shadow = TRUE) + 
  theme_dag()

  • there is no adjustment set! everything rests “unadjusted”
  • because the effect of SVF goes through canopy, adding canopy to your model blocks that effect
  • your model might look like this:
temp_model <- lmer(temperature ~ SVF + (1|date),
                   data = temp_df)

Confounders: collider

  • X and Y are not associated (share no causes)

  • X and Y both influence Z

  • Once stratified by Z, X and Y appear associated

Statistical Rethinking, Lecture 5

EXAMPLE: public and private tree species richness

  • both public tree species richness and private tree species richness contribute to the urban forest’s functional diversity
  • however, public tree species richness does not influence private tree species richness and vice versa

  • BUT if we have a model for private tree species richness with both urban forest functional diversity and public tree species richness included, an association between public and private species richness will appear when it does not truly exist

  • “spurious correlation”

Confounders: descendant

  • descendant can be different depending on what it is attached to

  • X and Y are causally associated through Z

  • A holds information about Z

  • if stratified by A, X and Y are less associated

Statistical Rethinking, Lecture 5

  • we know from before that we shouldn’t add proportion of native trees to our model if we want to test the effect of land use type on bird behaviour because its a pipe
  • because proportion of invasive trees is a descendant, adding it to the model will have the same effect but weaker
  • descendants can be used as proxies for our variables of interest

Adjusting your models for confounders

  • to test the effect of X on Y, we need to identify which variables we need to adjust (aka add to the model) to block all confounding paths

  • confounders are complex when there are more than 4 or 5 variables in the system

  • to figure out what variables you need to adjust, we can use dagitty!

Some DAG notes / a petit sermon

  1. variables that do not have shared causes in your system do not need to be included - your DAG does not need to include every variable in the world

  2. do NOT exclude variables just because you haven’t measured them, these are still potential confounders and need to be part of your DAG!

  3. you are an expert with good intuition and expertise, don’t be scared to put your assumptions down on paper

  4. presenting the assumptions you are making about your system is good, transparent science and allows the development of the field

    • !! you are doing this anyways !! when you decide what variables to collect / what to include in your models, you are just being less transparent about it! we must always do our best and be brave!

Table 2 fallacy

dagitty.net - crowd-sourced example

Resources