# Individual Participant Data Meta-Analysis: Example with R

## Aggregated Data Meta-Analysis and IPDMA

Traditional meta-analyses use aggregated or summary level information from studies or reports (Cooper & Patall, 2009; Riley et al., 2010). Analysts conducting aggregated data meta-analysis would look up relevant literature and code summary statistics needed to calculate one or more effect sizes from each study and also code the corresponding moderator variables. And, then run meta-regression models to (1) summarize effect size estimates across studies, (2) characterize variability in effect sizes across studies, and (3) explain the variability in the effect sizes. Moderator variables will be at the effect size-level or the study-level (e.g., the outcome measure or the percentage of economically disadvantaged students in the sample used to calculate the effect). One drawback of aggregated data meta-analysis is that we cannot examine the effect of moderators at the individual level. For example, we can say that studies with higher percentage of economically disadvantaged students tend to have higher effects of some treatment. However, we cannot say that the treatment works better for economically disadvantaged students. To do so would be to commit ecological fallacy (Geissbühler et al., 2021).

Another way of conducting meta-analysis is to use individual participant-level data instead of aggregated summaries (Riley et al., 2010). For each study in the meta-analysis, the analyst would have access to the individual-level data. Outcomes and moderator variables will be at the individual level (e.g., students’ scores on an achievement test and indicator for whether or not they are economically disadvantaged). Because data is at the individual-level, analysts can conduct subgroup analyses to examine the potentially heterogeneous effects of a treatment for different subgroups (Cooper & Patall, 2009). The feasibility of conducting such subgroup analyses provides IPDMA a major advantage over aggregated data meta-analysis, which heavily rely on the analyses conducted by primary study authors who may not have reported results from subgroup analyses.

## IPDMA Analyses in R

There are two ways do conduct IPDMA: (1) one-stage meta-analysis which involves analyzing data from all studies at once; and, (2) two-stage meta-analysis which involves first analyzing individual data separately for each primary study and then synthesizing the effects using meta-regression models (Cooper & Patall, 2009; Riley et al., 2010). I will walk through how to run each of these using an example data.

### Example Dataset

I’ve tried to find a publicly available IPDMA dataset to use as an example. However, I haven’t found one appropriate for this post. Thus, I am using a dataset is from a block randomized study (Bryan et al., 2016). Blocks can be somewhat thought of as different studies in a meta-analysis (not the same thing but please go along for this post). In the data, students were randomized within classes. There are 30 classrooms (like 30 different studies) within which participants were randomized. The study examined the effects of brief psychological interventions on eating behaviors. The outcome that I am going to analyze is `autonprosocial`

, four-item self-report measure of alignment of healthy eating with adolescent values.

```
library(tidyverse)
library(knitr)
library(estimatr)
library(metafor)
library(lme4)
library(lmerTest)
library(broom)
library(broom.mixed)
library(kableExtra)
library(janitor)
options(knitr.kable.NA = '')
bryan_dat <- read_csv("https://raw.githubusercontent.com/meghapsimatrix/datasets/master/causal/bryan_dat.csv")
glimpse(bryan_dat %>% select(classroom, condition, condition_collapsed, female, autonprosocial))
```

```
## Rows: 501
## Columns: 5
## $ classroom <chr> "B2", "B2", "A1", "A2", "B3", "A3", "B4", "B2", "D…
## $ condition <chr> "expose treatment", "expose treatment", "no treatm…
## $ condition_collapsed <chr> "expose treatment", "expose treatment", "control",…
## $ female <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,…
## $ autonprosocial <dbl> 3.75, 4.25, 2.25, 4.25, 3.25, 3.25, 4.25, 1.25, 2.…
```

### One Stage IPDMA

One stage IPDMA basically involves running one analysis on all the data accounting for clustering by study. Below I am running HLM using `lmer()`

from `lme4`

package specifying fixed intercepts but random slopes for treatment effect by classroom. The model that I am using is based on suggestion by Bloom et al. (2016). The results table shows the treatment effect estimate (and it’s se etc.) and also the estimate of the standard deviation of the treatment effects across classrooms.

```
# creating treatment indicator
bryan_dat <-
bryan_dat %>%
mutate(trt_ind = as.integer(condition_collapsed == "expose treatment"))
# function to estimate treatment effects
estimate_effects <- function(dat, stage){
if(stage == "one"){
mod <- lmer(autonprosocial ~ 0 + classroom + trt_ind +
(0 + trt_ind | classroom),
data = dat)
} else if(stage == "two") {
mod <- lm_robust(autonprosocial ~ trt_ind, data = dat)
}
res <- tidy(mod) %>%
filter(str_detect(term, "trt_ind")) %>%
clean_names() %>%
mutate(v = std_error^2)
return(res)
}
# estimate of treatment effect and the sd of treatment effect across classrooms
estimate_effects(bryan_dat, stage = "one") %>%
kable(digits = 3)
```

effect | group | term | estimate | std_error | statistic | df | p_value | v |
---|---|---|---|---|---|---|---|---|

fixed | trt_ind | 1.157 | 0.083 | 14.014 | 25.776 | 0 | 0.007 | |

ran_pars | classroom | sd__trt_ind | 0.146 |

#### Subgroup Analyses

Below I am creating subgroups based on the `female`

variable and estimating the treatment effect and the sd of treatment effects across classrooms for each subgroup:

```
bryan_dat <- bryan_dat %>%
mutate(female = ifelse(female == 1, "Female", "Not Female"))
bryan_dat %>%
group_by(female) %>%
do(estimate_effects(., stage = "one")) %>%
kable(digits = 3)
```

```
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
```

female | effect | group | term | estimate | std_error | statistic | df | p_value | v |
---|---|---|---|---|---|---|---|---|---|

Female | fixed | trt_ind | 1.217 | 0.109 | 11.187 | 232 | 0 | 0.012 | |

Female | ran_pars | classroom | sd__trt_ind | 0.000 | |||||

Not Female | fixed | trt_ind | 1.043 | 0.115 | 9.058 | 207 | 0 | 0.013 | |

Not Female | ran_pars | classroom | sd__trt_ind | 0.000 |

### Two Stage IPDMA

#### First Stage: Primary Study Analysis

First, I estimate the average treatment effect by block. For IPDMA, we would estimate the treatment effects for each primary study:

```
first_stage_res <-
bryan_dat %>%
group_by(classroom) %>%
do(estimate_effects(., stage = "two"))
glimpse(first_stage_res)
```

```
## Rows: 30
## Columns: 11
## Groups: classroom [30]
## $ classroom <chr> "A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3", "B4", …
## $ term <chr> "trt_ind", "trt_ind", "trt_ind", "trt_ind", "trt_ind", "trt_…
## $ estimate <dbl> 1.7951128, 1.4062500, 0.1250000, 0.9318182, 0.9130952, 1.333…
## $ std_error <dbl> 0.2744876, 0.2830958, 0.3683758, 0.2781655, 0.3595215, 0.496…
## $ statistic <dbl> 6.5398688, 4.9673998, 0.3393274, 3.3498700, 2.5397511, 2.687…
## $ p_value <dbl> 9.175979e-07, 3.266890e-04, 7.387762e-01, 4.386511e-03, 2.11…
## $ conf_low <dbl> 1.2285983, 0.7894372, -0.6559219, 0.3389225, 0.1545711, 0.28…
## $ conf_high <dbl> 2.3616273, 2.0230628, 0.9059219, 1.5247139, 1.6716194, 2.380…
## $ df <dbl> 24, 12, 16, 15, 17, 17, 7, 32, 26, 19, 16, 10, 12, 11, 18, 1…
## $ outcome <chr> "autonprosocial", "autonprosocial", "autonprosocial", "auton…
## $ v <dbl> 0.07534343, 0.08014323, 0.13570076, 0.07737603, 0.12925574, …
```

#### Second Stage: Meta-Analysis

Then, in the second stage, I synthesize the block specific treatment effects using metafor::rma.uni(), which weighs each effect size estimate by its precision:

```
second_stage_res <- rma.uni(yi = estimate,
sei = std_error,
data = first_stage_res,
method = "REML",
test = "knha")
tibble(
rowname = rownames(second_stage_res$b),
estimate = as.vector(second_stage_res$b),
SE = second_stage_res$se,
ci_lo = second_stage_res$ci.lb,
ci_hi = second_stage_res$ci.ub,
tau_2 = second_stage_res$tau2
) %>%
kable(digits = 3)
```

rowname | estimate | SE | ci_lo | ci_hi | tau_2 |
---|---|---|---|---|---|

intrcpt | 1.214 | 0.088 | 1.034 | 1.394 | 0.13 |

#### Subgroup Analyses

#### First Stage: Primary Study Analysis

Here, I estimate treatment effect by classroom and by the female variable:

```
subgroup_fs_res <- bryan_dat %>%
group_by(classroom, female) %>%
do(estimate_effects(., stage = "two"))
```

`## 1 coefficient not defined because the design matrix is rank deficient`

#### Second Stage: Meta-Analysis

Then, in the second stage, I synthesize the block specific subgroup effects.

```
estimate_subgroup_mod <- function(dat){
second_stage_res <- rma.uni(yi = estimate,
sei = std_error,
data = dat,
method = "REML",
test = "knha")
res <- tibble(
rowname = rownames(second_stage_res$b),
est = as.vector(second_stage_res$b),
SE = second_stage_res$se,
ci_lo = second_stage_res$ci.lb,
ci_hi = second_stage_res$ci.ub,
tau_2 = second_stage_res$tau2
)
return(res)
}
subgroup_fs_res %>%
group_by(female) %>%
do(estimate_subgroup_mod(.)) %>%
kable(digits = 3)
```

```
## Warning: Studies with NAs omitted from model fitting.
## Warning: Studies with NAs omitted from model fitting.
```

female | rowname | est | SE | ci_lo | ci_hi | tau_2 |
---|---|---|---|---|---|---|

Female | intrcpt | 1.234 | 0.124 | 0.979 | 1.489 | 0.290 |

Not Female | intrcpt | 1.179 | 0.101 | 0.972 | 1.387 | 0.171 |

## References

Bates, D., Maechler, M., Bolker, B. & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Bryan, C. J., Yeager, D. S., Hinojosa, C. P., Chabot, A., Bergen, H., Kawamura, M., & Steubing, F. (2016). Harnessing adolescent values to motivate healthier eating. *Proceedings of the National Academy of Sciences*, *113*(39), 10830-10835.

Cooper, H., & Patall, E. A. (2009). The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. *Psychological methods*, *14*(2), 165.

Geissbühler, M., Hincapié, C.A., Aghlmandi, S. *et al.* Most published meta-regression analyses based on aggregate data suffer from methodological pitfalls: a meta-epidemiological study. *BMC Med Res Methodol* **21**, 123 (2021). https://doi.org/10.1186/s12874-021-01310-0

Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: rationale, conduct, and reporting. *Bmj*, *340*.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1-48. https://doi.org/10.18637/jss.v036.i03

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.