I have been working in the program evaluation / causal inference space for ~ 9 years. Recently though, I came across an experimental test design that I strangely hadn’t thought much about before.
Here is an example of this design:
Students are randomized to get some tutoring vs another type of tutoring (Tutoring A vs Tutoring B). Randomization happens at the student level. However, tutoring is administered in groups by tutors who have different skills that’s aligned with each type of tutoring. Ten tutors with advanced skills will be administering Tutoring A and 7 different tutors with not as advanced skills will be administering Tutoring B. Each tutor under condition A or B will be teaching say 20 students. The point of the study is to compare students’ performance for students who receive Tutoring A vs B.
Even though students are individually randomized, this is not a simple individually randomized test because students are nested in tutors and receive treatment in clusters. Students taught by the same tutors will tend to have similar outcomes (e.g., scores on a test) than students taught by different tutors.
But it is also NOT a cluster randomized trial (CRT), which I am much more familiar with. In CRTs, randomization happens at the cluster level.
The type of design that I am dealing with is apparently called Individually Randomized Grouped Treatment trials (IGRT). I found a couple of articles and also reached out to my forever advisor James Pustejovsky for his thoughts about this design (Pals et al., 2008; Roberts & Roberts, 2005). Following is a summary of what I learned. For the sake of this post, let’s assume a two-arm trial.
Independence Assumption
When we deal with a simple individually randomized two-group comparison, we typically use t-test or linear regression to estimate the treatment effect. To make inferences from these models, we have to assume that the errors are independent. The grouping or clustering of units in CRTs and IGRTs violates that assumption. Students taught by the same tutor will likely have similar outcomes compared to students taught by different tutors. Not accounting for such dependence or between-cluster vartiation can lead to underestimated standard errors and, thus, Type 1 error inflation and inappropriate confidence intervals.
When we are dealing with IGRTs there are a few nuances that we need to consider when accounting for clustering.
Clustering in the Different Treatment Conditions
In cluster randomized tests, both treatment and control groups will contain clusters as clusters are randomized into the treatment conditions. In the example I described above, both treatment and control group have clustering. BUT, there is another version of IRGT where we can have students who get assigned to treatment get tutoring by different tutors whereas students who get assigned to control do not receive any tutoring. Hence, clustering only happens in the treatment group.
The actual extent of clustering can also differ between the two conditions in IGRT in a way that it doesn’t in CRTs (Roberts & Roberts, 2005). The extent of clustering is measured with a metric called the design effect, which for CRTs and IGRTs is based on cluster size as well as the intra-cluster coefficient (ICC). The ICC is defined as the proportion of variance in the outcome that can be explained by the groups or clusters (Pals et al., 2008).
In CRTs, if randomization goes as planned, the treatment and control conditions should have roughly similar size clusters. However, we cannot expect the same in IGRT.
Moreover, ICCs can also differ between the treatment and control conditions in IGRTs. As I mentioned above, we can have a design where there is only clustering in the treatment group and not in the control group. So, ICC would only apply to the treatment group. Even in designs where clustering is present in both treatment and control groups, we can have situations where the effect of clustering is stronger in one group compared to the other. Going back to our tutoring example,
Tutors are assigned based on availability/ location. Then the groups will be even more different.
Treatment differences
SUTVA
Like in cluster randomized trials, we do need to think about possible of multiple treatments being deployed.
In cluster randomized trials, people select themselves into groups. In IRGT, individuals are assigned to groups. They can be assigned randomly or more realistically units can get assigned to clusters based on location, time, availability (for example, of tutors).