The August 3, 2017 edition of The New England Journal of Medicine gives four reasons why the FDA, in particular, should look beyond the tried, true and, perhaps shop worn, Randomized Control Trial (RCT) approach to evaluate medical devices and pharmaceuticals.
4 Reasons to Look Beyond RCTs

To date, RCTs have been virtually the sole determinant of which products will be graced with an FDA license, clearance or approval.
But that should change said author Tom Frieden, M.D., former Director of the Centers for Disease Control and Prevention (CDC).
“Although randomized, controlled trials (RCTs) have long been presumed to be the ideal source for data on the effects of treatment”, began Dr. Frieden, “other methods of obtaining evidence for decisive action are receiving increased interest, prompting new approaches to leverage the strengths and overcome the limitations of different data sources.”
One Size Does Not Fit All
RCTs are, in effect, the sole gatekeeper to receiving an FDA license, clearance or approval. That has created the conditions for a gap between the data the FDA received and the real world physician experience with those devices, drugs or biologics.
According to Dr. Frieden, “Although they can have strong internal validity, RCTs sometimes lack external validity; generalizations of findings outside the study population may be invalid. RCTs usually do not have sufficient study periods or population sizes to assess duration of treatment effect (e.g., waning immunity of vaccines) or to identify rare but serious adverse effects of treatment, which often become evident during postmarketing surveillance and long-term follow-up but could not be practically assessed in an RCT.”
To drive home his point that RCTs, while vital, need to be augmented with, in particular observational studies, Frieden offered four real world example of when RCTs failed to accurately describe the eventual efficacy or safety profile for FDA approved/cleared/licensed products.
Example #1: The Live Attenuated Influenza Vaccine Surprise
Live attenuated influenza vaccine, known as “nasal spray” influenza vaccine, was first licensed for commercial use by the FDA in 2003.
Since 2007, it’s been used in healthy children and adults 2 to 49 years of age.
RCTs conducted post licensing showed that attenuated influenza vaccine was superior to inactivated influenza vaccine in children. So, the FDA’s Advisory Committee for Immunization Practices (ACIP) said to use it in healthy children 2 to 8 years of age for the 2014–2015 influenza season.
Surprisingly, a subsequent observational study showed worse performance for live attenuated vaccine than was shown in the RCTs. More recently, according to Frieden, live attenuated vaccine was observed to have at or near zero efficacy, especially against the 2009 H1N1 pandemic influenza virus!
Frieden’s conclusion? “Changes in vaccine formulation (from trivalent to quadrivalent), the population vaccinated (e.g., natural immunity resulting in neutralization of live vaccine), or another factor or factors caused the RCT data to lack external validity and be misleading, as compared with prospectively collected vaccine-efficacy data.”
Example #2: When Top Societies Replaced RCT With Non-RCT Studies
The American Thoracic Society, World Health Organization, and Centers for Disease Control and Prevention recommend non-RCT studies as the standard of practice for evaluating the effects of tuberculosis treatment.
How did this happen?
Beginning in 1946, when an RCT of streptomycin set the stage for widespread treatment of tuberculosis, and continuing for over four decades, the British Medical Research Council in collaboration with researchers all over the world, has led a series of long-term RCTs for tuberculosis treatment.
In Frieden’s words “Each trial built on previous findings, with the effect of refining drug regimens and minimizing the duration of antituberculosis treatment.”
When tuberculosis treatment moved from sanitariums to homes, the limitations of the RCT approach became clear. Specifically, RCTs lacked evaluation of health, epidemiologic and societal costs of relapse or the rare emergence of drug-resistant tuberculosis.
RCTs failed to, in Frieden’s words, “establish a method of treatment that can be consistently applied to a large program in which thousands or millions of patients are treated.”
As a result, non-RCTs, which include decision analyses of program effect, genotyping of isolates from patients in communities with different directly observed treatment practices, reviews of medical and public health records along with epidemiologic and laboratory analyses of multidrug-resistant tuberculosis outbreaks, are now standard practice.
Example #3: When RCTs Are, by Design, Wrong
Sodium, according to more than 100 RCTs, can contribute to hypertension which is a major risk factor to cardiovascular disease—the leading cause of death in the United States.
According to Frieden, “Meta-analysis of sodium-reduction trials of at least 6 months’ duration in which moderate reductions in intake were achieved, as well as well-designed, long-term cohort studies, have provided strong evidence that lower sodium intake is associated with a reduced incidence of cardiovascular events.”
But there’s a problem.
In order to create a valid sodium RCT, researchers have to design studies which require multiple 24-hour urine collections over a period of time. Any other approach, like spot or single 24-hour urine collection, introduce unacceptable levels of intraindividual variability.
That design imperative creates a potentially fatal flaw. The results, less sodium means less hypertension, may not be translatable to a general population—as a growing chorus of physicians and researchers are postulating.
Again, Dr. Frieden: “Because of challenges in accurately measuring usual sodium intake and excretion and the potential for misclassification of exposure, cohort studies must use multiple 24-hour urine collections to be valid, and study designs that use population means, which are subject to less variation than measurements of individual intake, often provide more reliable information.”
Example #4: Yes, Virginia, You CAN Un-bias Observational Studies
The reason RCTs have become the dominant source of information for the FDA is because they are designed by way of blinded randomization to evenly distribute known and unknown factors among control and intervention groups. The goal is to reduce the potential for bias or confounding.
But what if there were techniques which would likewise reduce the potential for bias in non-RCT studies?
Two key stakeholders in this discussion, Veterans Health Administration (VA) and Medicare, have attempted to do just that.
The test study looked at two drugs for type 2 diabetes—sulfonylureas and thiazolidinediones. Instead of an RCT, the researchers looked at prescribing pattern data and indexed/mined it in a manner which approximated an RCT. The researchers indexed for patients receiving sulfonylurea or thiazolidinedione on the prescription frequency during the previous year. The study “n” was 80,000 patients. The duration was 10 years. The study was 20 times larger than any previous RCT comparing second-line diabetes drugs.
Results: 68% higher risk of avoidable hospitalization and a 50% higher risk of death associated with treatment with sulfonylureas vs. thiazolidinediones.
The VA is planning a similar study comparing chlorthalidone versus hydrochlorothiazide for hypertension.
In this case researchers hope to index and mine primary care physician electronic medical records (EMR). The “n” in this case is planned to be 13,500 veterans older than 65 years of age who are currently receiving hydrochlorothiazide. The researchers then plan to randomly change the prescription from hydrochlorothiazide or chlorthalidone for a portion of the patients and, from the EMR records, mine the results over a three-year study period.
Simple design. Low cost. Reduced bias.
New Tools, New Approaches
Frieden’s closing comment mirrors many of the arguments we’ve made in these pages. Top academic researchers, the regulatory agencies and forward thinking senior executives at the major device and pharma companies realize that an over-reliance on single measures (i.e., P-Value) or single study approaches (i.e., RCT) has exacerbated two of the most vexing problems in medicine today—lack of reproducibility and increasing bias.
This is a thought provoking and important paper (Evidence for Health Decision Making — Beyond Randomized, Controlled Trials) and we commend it to all our readers.

Discussion
This is a fascinating development. In my practice we've seen similar outcomes with the revised protocol. The key differentiator seems to be patient selection criteria. Has anyone else noticed the correlation with BMI thresholds?
Great point. I'd push back slightly on the conclusion, the sample size in the cited study is too small to draw population-level inferences. That said, the directional signal is compelling and worth a larger RCT.
We implemented a similar approach last year. Early results are promising but we're still gathering 12-month follow-up data. Happy to share our protocol if anyone is interested.
Join the conversation
Orthopedic professionals are discussing this. Sign in and upgrade to read every comment and add your voice.