How can you generalize an n-of-1 trial's results to another person?
Last updated: Jun 28, 2024
I repeatedly get this question from colleagues. And it’s important—foundational, in fact!
The real question they’re asking is:
Why would we use anything other than a randomized controlled trial (RCT)* to find real effects that generalize beyond our study sample?
It’s akin to asking how one would generalize the results (i.e., average treatment effects or ATEs) from an RCT to non-participants. The non-participants would have to resemble the participants in the RCT by meeting the same eligibility criteria as the RCT’s target population. If they do, the effects might transport (i.e., generalize) to these non-participants.
The person is the population.
Similarly, the next would-be n-of-1 trial** participant would have to “be a target population” resembling the original n-of-1 trial participant. That is, their own full set of multivariate time series*** needs to be similar enough to the original participant’s such that they “meet the same eligibility criteria”—characteristics known (or reasonably assumed) to meaningfully modify the treatment effects (i.e., effect modifiers).
In this way, the n-of-1 trial participant is a “population-of-one” (Daza, 2018). That is, the participant’s own history—the set consisting of multiple versions of the participant over time—is the target population of an n-of-1 trial. The person is the population (Daza, 2019).
Understood this way, the original question then becomes one of transportability:
How well does one n-of-1 trial participant’s results (“population”-level effects) generalize to another person (“population”)?
So, an n-of-1 trial’s results don’t generalize… in general?
An n-of-1 trial is, for example, a more general case of a no-pooling approach that is made into a partial-pooling approach in a mixed-effects model.
Consider a two-arm repeated-measures RCT wherein participants crossover once. All participants receive both treatments, but the order of each treatment phase is randomized. The participants are measured multiple times during each treatment phase.
The mixed-effects model’s ATE for a given trial participant is a conditional ATE (i.e., conditioned on that participant’s random effects). In an n-of-1 trial, this conditional ATE is called the “average period treatment effect” or APTE (Daza, 2018), a stable average recurring individual treatment effect—the direct analogue of the ATE estimated by an RCT.
Hence, the original question translates to:
How can you generalize the mixed-effects estimate of one RCT participant to a person outside the RCT—or, at least, to another person in that same RCT?
N-of-1 trials can generate RCT results “from the bottom up”.
One common approach to combine results from different n-of-1 trial participants is hierarchical or multilevel modeling—commonly via the aforementioned mixed-effects model. It’s conceptually similar to conducting a meta-analysis of different studies.
Here, each study is a person’s own n-of-1 trial. And each n-of-1 trial is a crossover RCT on one person.
Suppose the statistical model and assumptions are exactly the same across multiple participants in their own n-of-1 trials. And suppose this model sufficiently removes any autocorrelation for any given participant (e.g., by including enough lag terms.) If this is a linear model (as in traditional time series), then only the model coefficients are allowed to differ.
But in a mixed-effects model for repeated measures, we already have a name for these coefficients: random effects!
More here: https://statsof1.org/resources/#aggregated-n-of-1-analyses
The design is the treatment.
It is crucial to understand that the n-of-1 (time series) design itself is generalizable to a population of distinct people. As mentioned above, the results from any particular n-of-1 trial are only meant to generalize to the average patterns in that individual’s history.
So a more compelling ATE of interest to clinical researchers might be:
How much does an n-of-1 trial design (the “treatment” design) improve personalization over another design like an RCT (the “control” design)?
For example: How much does an n-of-1 design—which allows models to vary across participants—improve estimation of heterogeneous treatment effects versus a traditional multilevel/hierarchical RCT that uses a mixed-effects model? The latter strictly imposes the same model structure on all participants.
Think of a similar design, the micro-randomized trial (MRT), which is used to find the best dynamic treatment regime or “treatment rule book” (i.e., set of treatment decision rules) that, when followed, sequentially tailors treatment to each trial participant. We use this design to estimate the population-level ATE over people in the target population who would use this treatment rule book versus those who would not.
Note that by design, each participant-specific set of treatment decisions based on this rule book is not generalizable. A participant’s own individualized treatment effect from sequentially making a particular set of treatment decisions doesn’t necessarily apply to others (except maybe to other participants who made the same exact sequential decisions).
But the overall set of possible decision rules codified in the rule book is itself generalizable. In an MRT, the treatment rule book IS the treatment that is being tested for efficacy. Our related question is, what is the efficacy of implementing a per-participant n-of-1 design? The design itself is the treatment.
What do you think?
Comment below.👇🏽
(My thanks to Dr. Thomas Debray for prompting this important discussion. This piece is also published on Medium here.)
Footnotes
*a.k.a. “A/B test”
**a.k.a. “switchback experiment”
***i.e., the true set of all of that person’s relevant possible repeated measurements, most of which are unobserved; exactly like how most of the people in a target population are unobserved