Research spotlight: Understanding and correcting bias in RWE studies
Jeremy Rassen, Sc.D.
Co-Founder, President, and Chief Science Officer, Aetion
Real-world evidence (RWE) studies are often criticized for their susceptibility to certain biases which—by design—don’t affect randomized controlled trials (RCTs). It’s important for RWE researchers to understand and correct these biases to continue to advance the field, and to reinforce the fact that high quality RWE can reliably inform decision-making.
Recently, William Murk, Ph.D., M.P.H., Sebastian Schneeweiss, M.D., Sc.D., and I conducted a study that addressed a surprising and likely incorrect finding in a previously published RWE analysis. Our work was published in Diabetes, Obesity, and Metabolism.
We initiated the study after reading two recent electronic health record (EHR) analyses published in the Journal of the American Medical Association (JAMA) that reported a 40 to 50 percent reduction in major adverse cardiovascular events (MACE) as soon as one month after bariatric surgery among patients with obesity and type 2 diabetes mellitus. Given the large effect size and the substantial difference between these results and those from recent RCTs, we sought to reproduce and refine one of these studies—Aminian et al.—to understand whether bias could have played a role in the findings.
We sought to address two major sources of potential bias in the Aminian study, which combined could have led to the surprising result: the use of a nonsurgical control group, and introduction of immortal time bias.
Below I address how we refined the original study, and what key methodological and study design factors real-world data (RWD) researchers must consider to support the generation of credible RWE.
Study design considerations
The Aminian paper compared surgical patients to nonsurgical patients. Choosing who gets surgery and who doesn’t is complicated; for example, some of the most severely ill patients may not receive surgery because they may be deemed unable to survive a surgical procedure. This choice of whether someone gets surgery requires care teams to make many clinical decisions, which may not be well documented in a patient’s EHR.
In addition, those who get surgery are closely monitored post-op, meaning that as we look at medical records, we’re more likely to see evidence of an outcome occurring than with nonsurgical patients. Therefore, there would be a differential amount and quality of information collected about nonsurgical patients and surgical patients.
To address this in our study, we substituted a surgical comparator group. Rather than compare to nonsurgical patients, we chose patients who received hip or knee replacements as our control—a surgical procedure, but not one that has an obvious connection to MACE. We chose Optum EHR data as our data source because it provides a longitudinal view of the patient before, during, and after the admission during which the surgery took place.
We also addressed immortal time bias—a key problem with the original study—to build an optimal RWE design. The original study looked into the future and excluded any control patient who died within 30 days from the start of the study—doing this gave an “advantage” to that group by excluding some of the sickest patients with known negative outcomes.
Along the way, we also improved upon the original study’s confounding adjustment strategy by measuring and adjusting for additional confounders, and by using high-dimensional propensity scores, an automated confounding adjustment technique.
With a firm design in place, in our analysis we matched patients on a range of criteria including age, sex, body mass index, socioeconomic factors, smoking status, and history of cardiovascular events. We then adjusted with propensity scores to represent a control group who underwent knee or hip surgery.
This study required us to run five analyses, for which we used Aetion Evidence Platform® (AEP). We had to work with the source data, perform measurements, and match patients across five different analyses with five different populations. AEP’s rapid-cycle analytics, workflows, and causal inference capabilities greatly reduced the time required for the analyses, and created clear documentation of our methods and results.
Analysis and key findings
Before running our refined analyses, we reproduced the previously published EHR analysis in our system, and reached a similar result to the original authors.
As a part of our study, we adjusted for factors including health service usage intensity as measured in the time just prior to baseline. Quantity of health services used is a good general marker of how sick or well a patient is; the more health services they use, the sicker they’re likely to be. We also added high dimensional propensity scores, which incorporates an artificial intelligence-driven approach to identify possible confounding factors and to augment human-chosen factors.
After this series of refinements, we found that bariatric surgery had no discernible effect on MACE (HR=0.99; 95% CI 0.76-1.30). We did see, however, the importance of careful design and analysis of RWE studies, including the need for a target trial approach and thoughtful control for potential confounding factors.
Implications for future RWE research
Our analysis illustrates that proper study design is critical to controlling bias. It also highlights the key principles we must consider when working with RWD—as these data were created for purposes other than for research.
In the end, our findings more aligned with limited RCT evidence than the original studies. As we continue to advance our understanding of where and how RWE can complement RCTs, this study builds upon findings from the RCT DUPLICATE project and other efforts to understand which kinds of RWE studies are likely to yield results that are meaningful and decision-ready.