Potential Outcomes

Counterfactuals, Treatment Effects, and Identifying Assumptions

Professor Benjamin Noble

Does X cause Y?

Do Arcades Cause Computer Science?

Chart showing total arcade revenue and computer science doctorates awarded in the US rising and falling together over time.

Counterfactuals

“…what is not, but could or would have been” (Starr 2019).
“If a person eats of a particular dish, and dies in consequence, that is, would not have died if he had not eaten it, people would be apt to say that eating of that dish was the source of his death” (Mill 2010).

Causal Inference as Counterfactuals

Does AI cause layoffs?
How many more people would get a job had AI never been developed?

News headline about executives citing AI as a top cause of layoffs.

Causal Inference as Counterfactuals

Did Iran conflict/gas prices cause Trump’s approval decline?
Would Trump be more popular had there not been Iran conflict/higher gas prices?

News headline card: Trump approval falls to 35 percent as Iran war and gas prices erode Republican support.

A Problem…

It’s not so easy to do this kind of research!

Randomized Controlled Trials

Randomly assign some units to treatment, others to control.
With enough units, everything else balances across the groups.
The two groups become good counterfactuals “all else equal.”

Diagram of a randomized controlled trial: a participant pool is randomly split into a treatment group and a control group. — via Cancer Council, NSW

One Way to Think About the Problem…

Potential Outcomes Framework (Splawa-Neyman 1932; Rubin 1974).
Suppose I have a headache.
- In world 1, I take aspirin and report severity of my headache between 0-10.
- In world 2, I take no aspirin and report severity between 0-10.
- Causal effect of aspirin: severity in world 1 minus severity in world 2.

A New Problem Emerges…

“The fundamental problem of causal inference” (Holland 1986).

Mean Girls meme: Lindsay Lohan looking shocked, captioned 'THE COUNTERFACTUAL DOES NOT EXIST'.

Treatment Assignment

Suppose we have a binary treatment variable:

D_i = \begin{cases} 1 & i \text{ received treatment (e.g., aspirin)} \\ 0 & i \text{ received no treatment (e.g., no aspirin)} \end{cases}

Suppose we have an outcome variable, Y_i (e.g., headache severity). We then have the following potential outcomes, Y_i^D:

Y_i^D = \begin{cases} Y_i^1 & \text{outcome if } i \text{ received treatment} \\ Y_i^0 & \text{outcome if } i \text{ did not receive treatment} \end{cases}

Observed Outcome

Y_i = \)\(D_i Y_i^1\)\(+\)\((1 - D_i) Y_i^0

\(Y_i\) is the outcome we actually observe for unit \(i\).
When \(D_i = 1\), the treated branch is "switched on."
When \(D_i = 0\), the control branch is "switched on."

(Hypothetical) Individual Treatment Effect

\delta_i = Y_i^1 - Y_i^0

The individual treatment effect compares the two potential outcomes for the same unit.
It is hypothetical because we cannot observe both worlds for any one person.

Average Treatment Effect

ATE = \frac{1}{N}\sum_{i=1}^{N}(Y_i^1 - Y_i^0) = \mathbb{E}[Y_i^1] - \mathbb{E}[Y_i^0]

The ATE averages individual treatment effects across units.
It is still built from both potential outcomes, so it is not directly observable.

Estimating the ATE

The target (but it is built from potential outcomes we never fully observe).

ATE = \mathbb{E}[Y_i^1] - \mathbb{E}[Y_i^0]

Substitute the outcomes we actually observe in each group.

ATE = \mathbb{E}[Y_i^1 \mid D_i = 1] - \mathbb{E}[Y_i^0 \mid D_i = 0]

Valid only when treatment is independent of potential outcomes.

(Y^1, Y^0) \perp D

Selection Bias

Let people choose treatment and the groups stop being comparable.
A naive comparison confuses the effect of treatment with who selected into it.
This is selection bias, and it lurks in any observational study.
Randomizing a large sample balances the groups and removes the bias.

The Fundamental Problem

What world are we in?

What are we estimating?

Four people with headaches (higher Y = worse)

\(i\)	\(D_i\)	\(Y_i\)	\(Y_i^1\)	\(Y_i^0\)	\(\delta_i\)

All potential outcomes: counterfactuals (red, italic) are unobservable

\(i\)	\(D_i\)	\(Y_i\)	\(Y_i^1\)	\(Y_i^0\)	\(\delta_i\)
1	1	2	2	5	−3
2	1	4	4	7	−3
3	0	1	0	1	−1
4	0	3	2	3	−1

Naive difference in means \(\mathbb{E}[Y_i \mid D_i=1] - \mathbb{E}[Y_i \mid D_i=0] = \frac{2+4}{2} - \frac{1+3}{2} = +1\)
ATE \(\frac{1}{N}\sum_i \delta_i = \frac{(-3)+(-3)+(-1)+(-1)}{4} = -2\)
ATT (treated only) \(\frac{2+4}{2} - \frac{5+7}{2} = -3\)

Counterfactual · unobservable · via Christopher Lucas

Comparing Our Estimates

Observed comparison

Naive difference in means

Aspirin looks harmful.

Hidden-world truth

Average treatment effect

−2

Aspirin helps on average.

Among the treated

ATT

−3

It helps the treated most.

A Few Other Assumptions: SUTVA

SUTVA: Stable Unit Treatment Value Assumption.
- Same treatment “dosage.”

A Few Other Assumptions: SUTVA

SUTVA: Stable Unit Treatment Value Assumption.
- Same treatment “dosage.”
- No externalities or spillovers.

Illustration of a person canvassing at a door, showing a possible spillover to another person nearby.

A Few Other Assumptions: SUTVA

SUTVA: Stable Unit Treatment Value Assumption.
- Same treatment “dosage.”
- No externalities or spillovers.
- No general equilibrium effects.

Illustration of door-to-door canvassing repeated widely, showing how repeated treatment can change the broader environment.

Taking Stock

Causal inference is about identification strategy and research design, which depends on treatment assignment. Not fancy machine learning methods.

Strongest: treatment is randomized by the researcher.
Strong: treatment is as-if random.
Mid: treatment is as-if random after controls.
Weakest: treatment is self-selected and important confounders remain unmeasured.