Continuous Treatment in Propensity Score Analysis

In my qualifying exam, in the written part, I was asked about how to analyze the effect of continuous, not binary, treatment using propensity score analysis. I skipped it for the written but I spent a few days looking up how to analyze this in case I would be asked during my oral examination. Sadly, no one asked me even when I asked them to, so here is a blog detailing my explorations.

Binary Treatment

For a review of propensity score analysis with binary treatment, please see Stuart (2010). Below let \(e(X)\) denote propensity scores, \(D\) denote a binary treatment, and \(X\) denote all the observed confounders. In the case of binary treatment, propensity scores represent the probability of receiving treatment given the covariates:

\[e(X) = P(D = 1|X)\]

We estimate the scores using logistic regression or machine learning techniques like generalized boosted models.

Extension to Contiuous Treatment

In binary treatment context, we assume that the potential outcomes (\(Y(1)\) and \(Y(0)\)) are independent of treatment given \(X\):

\[Y(1), Y(0) \perp\!\!\!\perp D |X\]

and by extension are independent given the propensity scores:

\[Y(0), Y(1) \perp\!\!\!\perp D|e(X)\]

Hirano and Imbens (2004) introduced the assumption of weak unconfoundedness in the context of continuous treatment. They stated: “we do not require joint independence of all potential outcomes. Instead, we require conditional independence to hold for each value of the treatment.” Below, let \(T\) denote a continuous treatment variable. The potential outcome when \(T = t\) is unreated to the treatment given the set of covariates:

\[Y(t) \perp\!\!\!\perp T |X\]

To calculate the propensity scores, in the case of continuous treatment, we cannot find the probability that continuous treatment (\(T\)) equals a given value \(t\). The likelihood of continuous variables taking on a given value is zero. For continuous treatment variable, we find the conditional density, the probability that \(T\) is infinitely close to \(t\) given \(X\). Below let \(r(t,x)\) denote the propensity scores. The right hand side of the equation represents the probability density function of a normal distribution. To estimate the propensity scores, we need to run a linear regression predicting the treatment by a set of covariates (Austin, 2019). From that we get the fitted values (\(X\hat{\beta}\)) and the model variance (\({\sigma}^2\)) (Austin, 2019). The fitted values take the place of the mean in the density function.

\[ r(t, x) = {f_{T|X}}^{(t|x)} = \frac{1}{\sqrt{2\pi\hat{\sigma}^2}} e^{-\frac{(t - X\hat{\beta})^2}{2\pi\hat{\sigma}^2}}\]

Conditional on the propensity scores, we can assume that each potential outcome is independent of treatment:

\[Y(t) \perp\!\!\!\perp T |r(t,x)\]

Hirano and Imbens (2004) state that: “Within strata with the same value of \(r(t,X)\), the probability that \(T = t\) does not depend on the value of \(X\).” I have seen \(1\) and \(I\) in front of the \((T = t)\), denoting the indicator function (Hirano & Imbens, 2004; Bia & Mattei, 2008).

\[X \perp\!\!\!\perp 1(T = t)|r(t,x)\]

Calculating Weights

Following the same logic as the inverse propensity weights (IPW) for the estimation of the average treatment effect (ATE) for a binary treatment, we calculate the inverse of the propensity scores as the weights:

\[\frac{1}{{f_{T|X}}^{(t|x)}}\]

However, Robins et al. (2000) noted that such weights can result in infinite variance (Austin, 2019). They suggested to use stabilized weights as follows:

\[\frac{{f_{T}}^{(t)}}{{f_{T|X}}^{(t|x)}}\]

Here the numerator represents the marginal density of treatment:

\[{f_{T}}^{(t)} = \frac{1}{\sqrt{2\pi\hat{\sigma_t}^2}} e^{-\frac{(t - \mu_t)^2}{2\pi\hat{\sigma_t}^2}}\]

The stabilized weights make the distribution of the IPW narrower as there is less difference between the numerators and the denominators (van der Wal & Geskus, 2011).

Real Data Analysis Example

The data that I use here is from High School and Beyond (HSB) longitudinal study used by Rosenbaum (1986) to analyze the effect of dropping out of high school on later math achievement. The missing data in the original dataset have been replaced with one iteration of imputation using mice (van Buuren & Groothuis-Oudshoorn, 2011). This is not an appropriate method to analyze missing data but for the purpose of the example I am just using the one complete data. For the sake of this example, let’s analyze the effect of math efficacy on later math achievement.

Loading the Data

library(tidyverse)

dat <- read_csv("https://raw.githubusercontent.com/meghapsimatrix/datasets/master/causal/HSLS09_complete.csv")

The Numerators

Here I am getting the numerators of the IPW, the marginal densities. I have regressed math_efficacy on just the intercept and used dnorm function to extract the densities.

# the numerator
mod_num <- lm(math_efficacy ~ 1, data = dat)

num <- dnorm(x = dat$math_efficacy, # treatment 
             mean = fitted.values(mod_num), # fitted values
             sd = summary(mod_num)$sigma) # model sigma

The Denominators

Here I am getting the denominators of the IPW, the conditional densities. I have regressed math_efficacy on \(X\) and used dnorm function to extract the densities. I am not quite sure whether to use the model sigma which divides the sum of errors squared by the degrees of freedom before taking the square root or whether I should just take the standard deviation of the errors. However, with large sample size the difference between the two are negligible.

# the demonimator
mod_den <- lm(math_efficacy ~ sex + race + language + repeated_grade + IEP + locale + region + SES, data = dat)

den <- dnorm(x = dat$math_efficacy, # treatment variable
             mean = fitted.values(mod_den), # fitted values
             sd = summary(mod_den)$sigma)

The IPW

Below I calculate the stabilized weights:

dat <- dat %>%
  mutate(ipw_s = num/den)

summary(dat$ipw_s)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1398  0.9186  0.9778  1.0001  1.0390  5.9782

Checking Balance and Outcome Analysis

Short story: For balance, we have to calculate weighted correlations, and for outcome analysis we estimate the expected outcome for each treatment level and compare (Austin, 2019).

References

Bia, M., & Mattei, A. (2008). A Stata package for the estimation of the dose-response function through adjustment for the generalized propensity score. The Stata Journal, 8(3), 354-373.

Hirano K and Imbens GW. The propensity score with continuous treatments. In: Gelman A and Meng X-L (eds) Applied Bayesian modeling and causal inference from incomplete-data perspectives. Chichester: John Wiley & Sons Ltd, 2004, pp.73–84.

Robins JM, Hernan MA and Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiol 2000; 11: 550–560.

Rosenbaum, P. R. (1986). Dropping out of high school in the United States: An observational study. Journal of Educational Statistics, 11(3), 207-224.

Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics, 25(1), 1.

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45 (3), 1–67. Retrieved from http://www.jstatsoft.org/v45/i03/

van der Wal, W. M., & Geskus, R. B. (2011). Ipw: an R package for inverse probability weighting. J Stat Softw, 43(13), 1-23.

Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A boosting algorithm for estimating generalized propensity scores with continuous treatments. Journal of causal inference, 3(1), 25-40.

Avatar
Megha Joshi
Doctoral candidate