Individual Participant Data Meta-Analysis: Example with R

Aggregated Data Meta-Analysis and IPDMA

Traditional meta-analyses use aggregated or summary level information from studies or reports (Cooper & Patall, 2009; Riley et al., 2010). Analysts conducting aggregated data meta-analysis would look up relevant literature and code summary statistics needed to calculate one or more effect sizes from each study and also code the corresponding moderator variables. And, then run meta-regression models to (1) summarize effect size estimates across studies, (2) characterize variability in effect sizes across studies, and (3) explain the variability in the effect sizes. Moderator variables will be at the effect size-level or the study-level (e.g., the outcome measure or the percentage of economically disadvantaged students in the sample used to calculate the effect). One drawback of aggregated data meta-analysis is that we cannot examine the effect of moderators at the individual level. For example, we can say that studies with higher percentage of economically disadvantaged students tend to have higher effects of some treatment. However, we cannot say that the treatment works better for economically disadvantaged students. To do so would be to commit ecological fallacy (Geissbühler et al., 2021).

Another way of conducting meta-analysis is to use individual participant-level data instead of aggregated summaries (Riley et al., 2010). For each study in the meta-analysis, the analyst would have access to the individual-level data. Outcomes and moderator variables will be at the individual level (e.g., students’ scores on an achievement test and indicator for whether or not they are economically disadvantaged). Because data is at the individual-level, analysts can conduct subgroup analyses to examine the potentially heterogeneous effects of a treatment for different subgroups (Cooper & Patall, 2009). The feasibility of conducting such subgroup analyses provides IPDMA a major advantage over aggregated data meta-analysis, which heavily rely on the analyses conducted by primary study authors who may not have reported results from subgroup analyses.

IPDMA Analyses in R

There are two ways do conduct IPDMA: (1) one-stage meta-analysis which involves analyzing data from all studies at once; and, (2) two-stage meta-analysis which involves first analyzing individual data separately for each primary study and then synthesizing the effects using meta-regression models (Cooper & Patall, 2009; Riley et al., 2010). I will walk through how to run each of these using an example data.

Example Dataset

I’ve tried to find a publicly available IPDMA dataset to use as an example. However, I haven’t found one appropriate for this post. Thus, I am using a dataset is from a block randomized study (Bryan et al., 2016). Blocks can be somewhat thought of as different studies in a meta-analysis (not the same thing but please go along for this post). In the data, students were randomized within classes. There are 30 classrooms (like 30 different studies) within which participants were randomized. The study examined the effects of brief psychological interventions on eating behaviors. The outcome that I am going to analyze is autonprosocial, four-item self-report measure of alignment of healthy eating with adolescent values.

library(tidyverse)
library(knitr)
library(estimatr)
library(metafor)
library(lme4)
library(lmerTest)
library(broom)
library(broom.mixed)
library(kableExtra)
library(janitor)

options(knitr.kable.NA = '')

bryan_dat <- read_csv("https://raw.githubusercontent.com/meghapsimatrix/datasets/master/causal/bryan_dat.csv")

glimpse(bryan_dat %>% select(classroom, condition, condition_collapsed, female, autonprosocial))
## Rows: 501
## Columns: 5
## $ classroom           <chr> "B2", "B2", "A1", "A2", "B3", "A3", "B4", "B2", "D…
## $ condition           <chr> "expose treatment", "expose treatment", "no treatm…
## $ condition_collapsed <chr> "expose treatment", "expose treatment", "control",…
## $ female              <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,…
## $ autonprosocial      <dbl> 3.75, 4.25, 2.25, 4.25, 3.25, 3.25, 4.25, 1.25, 2.…

One Stage IPDMA

One stage IPDMA basically involves running one analysis on all the data accounting for clustering by study. Below I am running HLM using lmer() from lme4 package specifying fixed intercepts but random slopes for treatment effect by classroom. The model that I am using is based on suggestion by Bloom et al. (2016). The results table shows the treatment effect estimate (and it’s se etc.) and also the estimate of the standard deviation of the treatment effects across classrooms.

# creating treatment indicator
bryan_dat <-
  bryan_dat %>%
  mutate(trt_ind = as.integer(condition_collapsed == "expose treatment"))


# function to estimate treatment effects
estimate_effects <- function(dat, stage){
  
  if(stage == "one"){
    
    mod <- lmer(autonprosocial ~ 0 + classroom + trt_ind + 
                (0 + trt_ind | classroom), 
                data = dat)
  
  } else if(stage == "two") {
    
    mod <- lm_robust(autonprosocial ~ trt_ind, data = dat)
    
  }
  
  res <- tidy(mod) %>%
    filter(str_detect(term, "trt_ind")) %>%
    clean_names() %>%
    mutate(v = std_error^2)
  
  return(res)
  
}



# estimate of treatment effect and the sd of treatment effect across classrooms
estimate_effects(bryan_dat, stage = "one") %>%
  kable(digits = 3) 
effect group term estimate std_error statistic df p_value v
fixed trt_ind 1.157 0.083 14.014 25.776 0 0.007
ran_pars classroom sd__trt_ind 0.146

Subgroup Analyses

Below I am creating subgroups based on the female variable and estimating the treatment effect and the sd of treatment effects across classrooms for each subgroup:

bryan_dat <- bryan_dat %>%
  mutate(female = ifelse(female == 1, "Female", "Not Female"))

bryan_dat %>%
  group_by(female) %>%
  do(estimate_effects(., stage = "one")) %>%
  kable(digits = 3)
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
female effect group term estimate std_error statistic df p_value v
Female fixed trt_ind 1.217 0.109 11.187 232 0 0.012
Female ran_pars classroom sd__trt_ind 0.000
Not Female fixed trt_ind 1.043 0.115 9.058 207 0 0.013
Not Female ran_pars classroom sd__trt_ind 0.000

Two Stage IPDMA

First Stage: Primary Study Analysis

First, I estimate the average treatment effect by block. For IPDMA, we would estimate the treatment effects for each primary study:

first_stage_res <-
  bryan_dat %>%
  group_by(classroom) %>%
  do(estimate_effects(., stage = "two")) 

glimpse(first_stage_res)
## Rows: 30
## Columns: 11
## Groups: classroom [30]
## $ classroom <chr> "A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3", "B4", …
## $ term      <chr> "trt_ind", "trt_ind", "trt_ind", "trt_ind", "trt_ind", "trt_…
## $ estimate  <dbl> 1.7951128, 1.4062500, 0.1250000, 0.9318182, 0.9130952, 1.333…
## $ std_error <dbl> 0.2744876, 0.2830958, 0.3683758, 0.2781655, 0.3595215, 0.496…
## $ statistic <dbl> 6.5398688, 4.9673998, 0.3393274, 3.3498700, 2.5397511, 2.687…
## $ p_value   <dbl> 9.175979e-07, 3.266890e-04, 7.387762e-01, 4.386511e-03, 2.11…
## $ conf_low  <dbl> 1.2285983, 0.7894372, -0.6559219, 0.3389225, 0.1545711, 0.28…
## $ conf_high <dbl> 2.3616273, 2.0230628, 0.9059219, 1.5247139, 1.6716194, 2.380…
## $ df        <dbl> 24, 12, 16, 15, 17, 17, 7, 32, 26, 19, 16, 10, 12, 11, 18, 1…
## $ outcome   <chr> "autonprosocial", "autonprosocial", "autonprosocial", "auton…
## $ v         <dbl> 0.07534343, 0.08014323, 0.13570076, 0.07737603, 0.12925574, …

Second Stage: Meta-Analysis

Then, in the second stage, I synthesize the block specific treatment effects using metafor::rma.uni(), which weighs each effect size estimate by its precision:

second_stage_res <- rma.uni(yi = estimate, 
                            sei = std_error,
                            data = first_stage_res, 
                            method = "REML", 
                            test = "knha")

tibble(
  rowname = rownames(second_stage_res$b),
  estimate = as.vector(second_stage_res$b),
  SE = second_stage_res$se,
  ci_lo = second_stage_res$ci.lb,
  ci_hi = second_stage_res$ci.ub,
  tau_2 = second_stage_res$tau2
) %>%
  kable(digits = 3)
rowname estimate SE ci_lo ci_hi tau_2
intrcpt 1.214 0.088 1.034 1.394 0.13

Subgroup Analyses

First Stage: Primary Study Analysis

Here, I estimate treatment effect by classroom and by the female variable:

subgroup_fs_res <- bryan_dat %>%
  group_by(classroom, female) %>%
  do(estimate_effects(., stage = "two"))
## 1 coefficient  not defined because the design matrix is rank deficient

Second Stage: Meta-Analysis

Then, in the second stage, I synthesize the block specific subgroup effects.

estimate_subgroup_mod <- function(dat){
  
  second_stage_res <- rma.uni(yi = estimate, 
                              sei = std_error,
                              data = dat, 
                              method = "REML", 
                              test = "knha")
  
  res <- tibble(
      rowname = rownames(second_stage_res$b),
      est = as.vector(second_stage_res$b),
      SE = second_stage_res$se,
      ci_lo = second_stage_res$ci.lb,
      ci_hi = second_stage_res$ci.ub,
      tau_2 = second_stage_res$tau2
    )

  return(res)
  
}
  
subgroup_fs_res %>%
  group_by(female) %>%
  do(estimate_subgroup_mod(.)) %>%
  kable(digits = 3)
## Warning: Studies with NAs omitted from model fitting.

## Warning: Studies with NAs omitted from model fitting.
female rowname est SE ci_lo ci_hi tau_2
Female intrcpt 1.234 0.124 0.979 1.489 0.290
Not Female intrcpt 1.179 0.101 0.972 1.387 0.171

References

Bates, D., Maechler, M., Bolker, B. & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Bryan, C. J., Yeager, D. S., Hinojosa, C. P., Chabot, A., Bergen, H., Kawamura, M., & Steubing, F. (2016). Harnessing adolescent values to motivate healthier eating. Proceedings of the National Academy of Sciences, 113(39), 10830-10835.

Cooper, H., & Patall, E. A. (2009). The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological methods, 14(2), 165.

Geissbühler, M., Hincapié, C.A., Aghlmandi, S. et al. Most published meta-regression analyses based on aggregate data suffer from methodological pitfalls: a meta-epidemiological study. BMC Med Res Methodol 21, 123 (2021). https://doi.org/10.1186/s12874-021-01310-0

Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: rationale, conduct, and reporting. Bmj, 340.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1-48. https://doi.org/10.18637/jss.v036.i03

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

Avatar
Megha Joshi
Senior Quantitative Researcher

Related