Non Parametric Causal Inference with PyMC

Propensity Scores, Debiased ML and Causal Mediation

Nathaniel Forde

Data Science @ Personio

and Open Source Contributor @ PyMC

3/1/24

Preliminaries

Profile

I’m a data scientist at Personio
- Bayesian statistician,
- Reformed philosopher and logician.
Website: https://nathanielf.github.io/

Disclaimer

None of Personio’s data was used in this presentation

Code or it didn’t Happen

The worked examples used here can be found here

The Pitch

Agnostic statistical methods in causal inference are well motivated but limited in scope.

We’ll show how Bayesian non-parametrics are natural framwork in with which to couch and extend these methods

Primary Debts and Sources

Agenda

Propaganda

Full Luxury Bayesianism

Posterior Predictive Imputation of Treatment Effects

Non Parametric Bayes Formula

Full Luxury Bayesianism

\[ p(\color{blue}{\theta} |D) \propto p(D | \color{blue}{\theta})p(D) \]

where \(\color{blue}{\theta}\) is an explict model parameter becomes

\[ p(\color{blue}{G} |D) \propto p(D | \color{blue}{G})p(D) \]

where \(\color{blue}{G}\) is a general stochastic process

Three Acts

Propensity Scores and Non-Parametric Causal Inference

Confounding and Debiasing

Causal Structure and Mediation

Act One

Propensity Scores and Non-Parametric Causal Inference

Strong Ignorability and Propensity Scores

BART models and Non-Parametric Estimation

Balance and Inverse Propensity Weighting

Robust and Doubly Robust methods

Strong Ignorability and Propensity Scores

Definitions

Potential Outcomes
- \(Y(0)\) and \(Y(1)\) under different treatment regimes \(T \in \{ 0, 1\}\)
Strong Ignorability
- Outcomes are independent of the treatment assignment given a covariate profile \(X\): \(Y(0), Y(1) \perp\!\!\!\perp T | X\)
Propensity Scores
- An estimate of the probability for a particular treatment status conditional on the covariate profile \(X\): \(0 \leq p_{t}(X) \leq 1\)

BART Models and Non-Parametric Estimation

BART
- Bayesian Additive Regression Trees
- “[B]lack-box method based on the sum of many trees where priors are used to regularize inference”
Non-Parametric Estimation
- Outcomes and Propensity Scores can be estimated using non-parametric methods
- Causal estimands can be estimated using posterior predictive imputation under different treatment regimes
- Benefit of minimalist structural assumptions.

Inverse Propensity Score Weighting

Adjustment by representative Weighting
- Using the propensity scores as a summary metric for group membership, we down-weight and upweight the prevalence of high and low propensity score in each group to induce strong ignorability like conditions.

Robust and Doubly Robust Methods

Differing Weighting Schemes

Raw
- \(\sum\frac{1}{N}\Big[ Y(1) \cdot \frac{1}{p_{T}(X)} - (Y(0)\cdot\frac{1}{1-p_{T}(X)}) \Big]\)
Doubly Robust
- \[ \hat{Y(1)} = \frac{1}{n} \sum_{0}^{N} \Bigg[ \frac{T(Y - m_{1}(X))}{p_{T}(X)} + m_{1}(X) \Bigg] \\ \hat{Y(0)} = \frac{1}{n} \sum_{0}^{N} \Bigg[ \frac{(1-T)(Y - m_{0}(X))}{(1-p_{T}(X))} + m_{0}(X) \Bigg] \]

Go to Code

Act Two

Confounding and Debiasing

Propensity Scores Miscalibrated

BART models and Overfitting

Debiased Machine Learning

CATE estimation

Miscalibrated Propensity Scores

What is the Estimand?

“Each theoretical estimand is linked to an empirical estimand involving only observable quantities (e.g. a difference in means in a population) by assumptions about the relationship between the data we observe and the data we do not.” - Lundberg et al in What is your Estimand

Q1. What are we aiming at when we estimate propensity scores for highly granular covariate profiles?
Q2. What happens when the sample data has no treatment data cases for a particular covariate profile?

Overfitting

BART models can achieve perfect Allocation

Overfitting

Debiasing Machine Learning

K-fold Propensity Estimation

CATE Estimation

Conditional Average Treatment Effect

Go to Code

Act Three

Causal Structure and Mediation

Parametric Mediation

Non-Parametric Mediation

Escalating Structural Assumptions and Bayesian Inference

Parametric Mediation

Traditional Model Based Mediation

Causal Mediation Analysis

Non-Parametric Estimation

NDE: \(E[Y(t, M(t^{*})) - Y(t^{*}, M(t^{*}))]\)
- Which is to say we’re interested in the differences in the imputed outcomes under different treatments, mediated by values for M under a specific treatment regime.
NIE: \(E[(Y(t, M(t))) - Y(t, M(t^{*}))]\)
- Which amounts to the imputed differences in the outcome Y due to differences in the treatment regimes which generated the mediation values M.
TE: NDE + NIE

Go To Code

Conclusions

Structural Beliefs and Bayesian Inference

Propensity score adjustment reflects the belief in the need for adjustment
- It reflects a belief in the adequacy of the propensity score model for achieving balance
Regression based imputation of treatment effects reflects the belief that covariate controls eliminates selection effects
Doubly Robust methods reflect the belief that either the propensity or outcome model is mispecified.
- But that one ought to be adequately specified.
Debiased ML methods reflect the belief that mis-specification can be corrected by cross-validation and non-parametric estimation of residuals in FWL theorem
Mediation Analysis reflects the belief that the causal influence must be interepreted with particular causal structure to avoid confounding.
There are no truly Agnostic statistics:In each case sound inference proceeds as we take steps to adjust our model or the conditions of its assessment as informed by our best beliefs regarding the problem to hand.