R

Individual Participant Data Meta-Analysis: Example with R

Aggregated Data Meta-Analysis and IPDMA Traditional meta-analyses use aggregated or summary level information from studies or reports (Cooper & Patall, 2009; Riley et al., 2010). Analysts conducting aggregated data meta-analysis would look up relevant literature and code summary statistics needed to calculate one or more effect sizes from each study and also code the corresponding moderator variables. And, then run meta-regression models to (1) summarize effect size estimates across studies, (2) characterize variability in effect sizes across studies, and (3) explain the variability in the effect sizes.

What Should I Use to Quantify Heterogeneity in Meta-Analysis?

Meta-analysis Scientific researchers tend to produce literature on the same topic either to replicate or extend prior studies or due to a lack of awareness of prior evidence (Hedges & Cooper, 2009). Results across studies tend to vary, even when researchers try to replicate studies, due to differences in sample characteristics, research designs, analytic strategies or sampling error (Hedges & Cooper, 2009). Meta-analysis is a set of statistical techniques for synthesizing results from multiple primary studies on a common topic.

What is a Confidence Interval?

library(tidyverse) library(knitr) Distributions There are three kinds of distributions: Population Distribution There are around 15 million adults in Texas and I am interested in estimating the average height of all adult Texans. I cannot go out and measure everyone’s height due to financial, capacity constraints etc. For the sake of this example, I am going to draw up a hypothetical population distribution of heights of all people in Texas.

ERCOT Price Gouging

Welcome to the Deregulated State of Texas! During the crisis brought on by government failure and capitalism last week in Texas that caused millions of people to suffer in freezing temperatures and suffer without water, some Texans received egregious electricity bills. I found the historical data on Electric Reliability Council of Texas (ERCOT) settlement point prices here. ERCOT manages the power grid system in Texas. According to the Quick Facts from ERCOT, “it also performs financial settlement for the competitive wholesale bulk-power market and administers retail switching for seven million premises in competitive choice areas.

Unemployment Claims COVID-19

In this post I am visualizing and analyzing the unprecedented increase in the number of unemployment claims filed in the US after the lockdown due to COVID 19 pandemic. I am retrieving the data from the tidyquant package (Dancho & Vaughan, 2020). library(CausalImpact) library(tidyverse) library(scales) library(tidyquant) ICSA Data Initial unemployment claims from the first date available, 1967: icsa_dat <- "ICSA" %>% tq_get(get = "economic.data", from = "1967-01-07") %>% rename(claims = price) glimpse(icsa_dat) ## Rows: 2,790 ## Columns: 3 ## $ symbol <chr> "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA"… ## $ date <date> 1967-01-07, 1967-01-14, 1967-01-21, 1967-01-28, 1967-02-04, 1… ## $ claims <int> 208000, 207000, 217000, 204000, 216000, 229000, 229000, 242000… icsa_dat %>% ggplot(aes(x = date, y = claims)) + geom_line(color = "blue") + scale_y_continuous(labels = comma) + labs(x = "Date", y = "Claims", subtitle = "As of June 29, 2020") + ggtitle("Unemployment Claims: 1967 to 2020") + theme_bw() Comparison to 2008 Recession In the graph below, I only selected 2008 to 2020.

Propensity Score Analysis with Multiply Imputed Data

In this post, I walk through steps of running propensity score analysis when there is missingness in the covariate data. Particularly, I look at multiple imputation and ways to condition on propensity scores estimated with imputed data. The code builds on my earlier post where I go over different ways to handle missing data when conducting propensity score analysis. I go through tidyeval way of dealing with multiply imputed data.

Missing Data in Propensity Score Analysis

Theories behind propensity score analysis assume that the covariates are fully observed (Rosenbaum & Rubin, 1983, 1984). However, in practice, observational analyses require large administrative databases or surveys, which inevitably will have missingness in the covariates. The response patterns of people with missing covariates may be different than those of people with observed data (Mohan, Pearl, & Tian, 2013). Therefore, ways to handle missing covariate data need to be examined.

Nepal Earthquake

I wanted to analyze the data from the April 2015 Nepal earthquake that resulted in around 10,000 deaths. I am using a dataset that I found in data.world. The data contains date, time, location and magnitude of the earthquake and the many aftershocks that followed. The data is updated as of June 2, 2015. Nepal is my birthplace, my homeland. The earthquake was an extremely traumatic event for people who live there.

Continuous Treatment in Propensity Score Analysis

In my qualifying exam, in the written part, I was asked about how to analyze the effect of continuous, not binary, treatment using propensity score analysis. I skipped it for the written but I spent a few days looking up how to analyze this in case I would be asked during my oral examination. Sadly, no one asked me even when I asked them to, so here is a blog detailing my explorations.