Instruments of (Behavior) Change
Last updated: Mar 10, 2020
I finally got a chance to sit down with Neto et al (2016) for a quick digest of their innovative article, Towards personalized causal inference of medication response in mobile health: an instrumental variable approach for randomized trials with imperfect compliance. I’d met Dr. Elias Chaibub Neto at a 2018 Sage Bionetworks. We immediately connected over our complementary ideas on n-of-1 (a.k.a., single-case/single-subject) causal inference.
Now—almost two years later—I finally carved out some time to see how their approach compared with mine in Daza (2018, 2019). My own approach set up a way to examine effects of would-be treatments or interventions on health outcome trends (e.g., an individual’s average longitudinal trends over a day under one treatment condition or exposure versus another), via an epidemiological tool called Robins’ g-formula.
But my approach assumed you could measure everything that strongly affected your health outcome—a pretty strong assumption! That said, it’s one that I hope is slowly-but-surely satisfied by the sheer amount and increasing availability of longitudinal data on a single individual (e.g., electronic health and medical records, -omics data, small data).
Neto and his colleagues took an instrumental variables approach. An instrumental variable (IV) is basically a randomized variable that affects the treatment, but otherwise doesn’t change the outcome of interest. An IV also isn’t associated with anything that might have affected both the treatment and outcome (i.e., it isn’t associated with any confounders).
Here’s their Figure 1, to illustrate. \(Z_t\) is the IV, \(X_t\) is the treatment, \(Y_t\) is the outcome, and \(U_t\) is the set of confounders:
An IV approach can be useful, for example, when you want to prompt a study participant to change their behavior—but when their ability to do that thing (i.e., to comply with the recommended treatment) may change based on unobserved factors. Here, you’re most interested in how much changing their behavior (i.e., the treatment) improves one of their health indicators (i.e., the outcome)—and not in how much the prompt itself (i.e., the IV) improves that outcome.
For example, you randomly administer a push notification on the participant’s mobile phone (the IV) to tell them to go for a walk or run (the treatment). You hope this improves their sleep quality (the outcome).
But their ability to exercise may depend on many other factors that also affect their sleep quality; for example, how they’d slept the night before. “BUT,” you object, “they record their sleep quality on all nights of the study. So they definitely recorded the previous night’s sleep quality—which I can therefore account for in my analysis!” Well and good, but you may not have measured other important confounders, like caffeine intake or other consumed foods (or any gastrointestinal conditions that can affect both willingness to exercise and that night’s sleep).
By design, the randomized push notification wouldn’t have anything to do with any other factors that may have affected your participant’s ability to exercise. And if they’re compliant enough to follow the pushed messages at least some of the time, that’s good. (That is, the push notification isn’t a weak instrument.) Finally, if the push notification can’t itself affect sleep quality (e.g., by accidentally prompting the participant to do something else that keeps them awake later that night), then you may just have yourself an IV! (This last condition is called the exclusion restriction in IV-speak.)
Of course, the IV approach also has weaknesses. I’ve already alluded to these; for example:
- The IV is a weak instrument.
- That is, the pushes hardly convince your study participant to exercise; the other factors are too strong of an influence. This might be mitigated by co-designing the pushes with the participant beforehand, to identify what kinds of messages they’re most likely to follow.
- The exclusion restriction is violated.
- For example, unbeknownst to you, your participant quickly develops a habit of using their exercise push notifications as reminders to write short blog posts that day <ahem>. They don’t usually get to this until later that night—thereby keeping them awake later than usual.
- The push notification is associated with confounders.
- You make sure your “push” algorithm accounts for the participant’s wake-up time that day; for example, by requiring the participant’s phone to randomize pushes only after they wake up. But their wake-up time could very well be related to their sleep quality the night before—a confounder that affects both today’s chances of exercising and tonight’s sleep quality.
Also, Neto et al’s approach assumes the existence of a feasible IV. When none is available, one of the best things you can do is to account for all measured potential confounders, and hope you’ve measured the strongest ones. If you know some of these were unmeasured, you could also conduct a senstivity analysis by simulating the unmeasured but known confounders. In the latter situations, methods like Daza (2018, 2019) come in handy.
All that said, I do think Neto et al’s approach is quite insightful, and makes sense in many digital health behavior contexts. I plan on proposing it as a sensible next step to the paper Dr. Logan Schneider and I are working on for examining the putative effects of physical activity level on sleep duration and quality.
As for my own g-formula-based methods, look for an upcoming long (#haba) post with details! In the meantime, you can check them out at: