In this post I am visualizing and analyzing the unprecedented increase in the number of unemployment claims filed in the US after the lockdown due to COVID 19 pandemic. I am retrieving the data from the tidyquant package (Dancho & Vaughan, 2020).
library(CausalImpact) library(tidyverse) library(scales) library(tidyquant) ICSA Data Initial unemployment claims from the first date available, 1967:
icsa_dat <- "ICSA" %>% tq_get(get = "economic.data", from = "1967-01-07") %>% rename(claims = price) glimpse(icsa_dat) ## Rows: 2,790 ## Columns: 3 ## $ symbol <chr> "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA", "ICSA"… ## $ date <date> 1967-01-07, 1967-01-14, 1967-01-21, 1967-01-28, 1967-02-04, 1… ## $ claims <int> 208000, 207000, 217000, 204000, 216000, 229000, 229000, 242000… icsa_dat %>% ggplot(aes(x = date, y = claims)) + geom_line(color = "blue") + scale_y_continuous(labels = comma) + labs(x = "Date", y = "Claims", subtitle = "As of June 29, 2020") + ggtitle("Unemployment Claims: 1967 to 2020") + theme_bw() Comparison to 2008 Recession In the graph below, I only selected 2008 to 2020.
In this post, I walk through steps of running propensity score analysis when there is missingness in the covariate data. Particularly, I look at multiple imputation and ways to condition on propensity scores estimated with imputed data. The code builds on my earlier post where I go over different ways to handle missing data when conducting propensity score analysis. I go through tidyeval way of dealing with multiply imputed data.
Theories behind propensity score analysis assume that the covariates are fully observed (Rosenbaum & Rubin, 1983, 1984). However, in practice, observational analyses require large administrative databases or surveys, which inevitably will have missingness in the covariates. The response patterns of people with missing covariates may be different than those of people with observed data (Mohan, Pearl, & Tian, 2013). Therefore, ways to handle missing covariate data need to be examined.
I wanted to analyze the data from the April 2015 Nepal earthquake that resulted in around 10,000 deaths. I am using a dataset that I found in data.world. The data contains date, time, location and magnitude of the earthquake and the many aftershocks that followed. The data is updated as of June 2, 2015.
Nepal is my birthplace, my homeland. The earthquake was an extremely traumatic event for people who live there.
Load the Data and Check Duplicates library(tidyverse) library(lubridate) library(kableExtra) library(ggridges) # there were complete duplicated rows dat <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv") %>% distinct(.) # removes complete dups # check duplicates dup_title <- dat %>% filter(duplicated(title) | duplicated(title, fromLast = TRUE)) %>% arrange(title) # examined they seem different movies even though same title dup_title %>% filter(duplicated(plot)) ## # A tibble: 0 x 12 ## # … with 12 variables: title <chr>, genres <chr>, release_date <chr>, ## # release_country <chr>, movie_rating <chr>, review_rating <dbl>, ## # movie_run_time <chr>, plot <chr>, cast <chr>, language <chr>, ## # filming_locations <chr>, budget <chr> dup_title %>% filter(duplicated(release_date)| duplicated(release_date, fromLast = TRUE)) ## # A tibble: 2 x 12 ## title genres release_date release_country movie_rating review_rating ## <chr> <chr> <chr> <chr> <chr> <dbl> ## 1 The … Comed… 21-Jul-15 USA <NA> 5.
In my qualifying exam, in the written part, I was asked about how to analyze the effect of continuous, not binary, treatment using propensity score analysis. I skipped it for the written but I spent a few days looking up how to analyze this in case I would be asked during my oral examination. Sadly, no one asked me even when I asked them to, so here is a blog detailing my explorations.