A recent post examining the utility of cricoid pressure and how to interpret non-inferiority trials stimulated discussion on Bayesian analysis and the question of whether a formal quantitative analysis is necessary or if an informal qualitative assessment is adequate. A recent reanalysis of the EOLIA trial published in JAMA by Goligher et al1 allows us to examine this question in more detail.
The EOLIA trial was an RCT published in NEJM examining the use of venovenous extracorporeal membrane oxygenation (VV ECMO) in patients with ARDS2. The authors enrolled patients with ARDS who had been receiving ventilation for less than 7 days, with hypoxia (a P/F ratio of < 50 for 3 hrs or > 80 for 6 hrs), hypercapnea (arterial blood pH<7.25 and a PCO2>60), and were refractory to ventilator optimization. Patients were randomized to receive VV ECMO or standard care, including a low tidal volume ventilatory strategy, neuromuscular blocking agents and prolonged periods of prone positioning. Recruitment maneuvers, inhaled nitric oxide, inhaled prostacyclin, or intravenous almitrine were available when oxygenation objectives were not met. In cases of refractory hypoxia, crossover to ECMO for patients in the control group was allowed.
240 patients were enrolled before the trial was halted prematurely when a preplanned interim analysis determined futility of achieving statistical significance in their primary endpoint, 60-day mortality. The authors reported 60-day mortality was 35% in the ECMO vs 46% in the control group (P=0.09). Multiorgan failure, respiratory failure, and septic shock were the main causes of death, equally represented between the two groups. The authors reported an expected increase in the rates of thrombocytopenia and bleeding events in the ECMO group, but they found no increase in the rates of intracranial hemorrhage.
Effectively this was a negative trial, stopped early for statistical futility, with no hope of identifying a statistically significant difference in their primary outcome. But such a interpretation of the results leaves us feeling a bit unfulfilled. Authors reported an 11% ARR in 60-day mortality, favoring patients who were randomized to receive ECMO. But even with this noticeable difference in outcomes, the trial was halted early because the authors powered the study to find a 20% ARR. And so we are left with a trial at serious risk of mistaking a clinically important benefit of ECMO for patients with ARDS as statistical noise.
There are clear signs that the trial lacked the appropriate statistical power as noted by a number of the trial’s secondary endpoints. Patients randomized to the control group more frequently experienced treatment failure (defined as death by 60 days or crossover to ECMO in the case of the control group) than the ECMO group. 28% of the patients in the control group eventually received VV ECMO. In addition, patients randomized to the ECMO group had significantly more days than those in the control group without prone positioning (59 vs. 46 days), renal replacement therapy (50 vs. 32 days; median difference), renal failure (46 vs. 21 days) or cardiac failure (48 vs. 41 days).
A Bayesian perspective of this trial reveals a more nuanced interpretation. A Bayesian analysis requires the establishment of a prior probability of efficacy of the treatment in question. This is based on the physiological plausibility of the treatment in question and any previous evidence examining its use. In this case, the data examining the use of VV ECMO in ARDS prior to the EOLIA trial is inconclusive.
There are a moderate amount of observational studies supporting its use, but they suffer from the methodological issues associated with such study designs3,4,5 . A multicenter RCT, the CESAR trial, published in the Lancet in 2008 demonstrated a modest benefit in favor of VV ECMO. The trial suffered from important methodological issues that limited its ability to address the isolated efficacy of VV ECMO in ARDS6. A meta-analysis examining the entirety of the data found a moderate mortality benefit. These results suffer from the same biases that were present in the original trials included in the analysis7.
Once a prior probability has been established, the current evidence is then added to the previous data and a posterior probability is established. With this in mind how do we interpret the EOLIA trial? The results demonstrate a clear signal of benefit. Was it the 20% ARR expected by the authors when they powered the study? Clearly no, but EOLIA suggests the use of VV ECMO offers a survival benefit. The exact effect size is unclear but likely it is modest (between 2-10%), when taking into account the prior literature, the results of the EOLIA trial, and the understanding that both observational data and underpowered RCTs frequently overestimate the effect size of the therapy in question.
Let us compare this subjective assessment with a recently published quantitative Bayesian analysis of the EOLIA trial. Published in JAMA by Goligher et al, the authors conducted a previously unplanned Bayesian reanalysis of the EOLIA trial. The authors utilized two different approaches to establish a prior probability.
In the first model the prior probabilities were calculated to reflect varying degrees of enthusiasm and skepticism towards the underlying efficacy of ECMO. Each of the prior probabilities represents a hypothetical benefit of ECMO and the strength of the evidence supporting this value. For example, the moderately enthusiastic model assumes a modest benefit to the use of ECMO in ARDS which is supported by a small amount of data. It supposes an RR of death of 0.78. This risk ratio was based on a hypothetical trial of 264 patients that demonstrated a 22% decrease in RR of death. The authors’ probability models varied in the level of evidence (from 100-264 patients) and the prior probability of efficacy (RR of 0.67-1), accordingly. In addition, the authors established a prior probability dubbed the minimally informative reference prior, which assumed there was very little prior evidence to inform us of the benefits or harms of ECMO. In this case the posterior probability would essentially be dependent on data from the EOLIA trial alone.
In the second model the authors used a meta-analysis of the previous trials7 on ECMO in ARDS to establish a probable treatment effect. They then down-weighted the strength of these prior probabilities based on assumptions made on the strength of the prior evidence. The idea is that these various prior probabilities represent the best estimate of the evidence supporting the use of ECMO for ARDS. When combined with data from the EOLIA trial, the resulting posterior distribution will lead to a more accurate depiction of the true efficacy of VV ECMO in ARDS.
The authors reported a host of posterior probabilities based off the various prior probability models. Unsurprisingly, when more optimistic assumptions were used to devise the prior probabilities, the more efficacious the treatment appeared following incorporation of EOLIA’s results. When the minimally informative reference group was used, as one would suspect, the results appeared to be fairly similar to the results of the EOLIA trial itself. The point estimate for ARR was 10.6%. The authors reported the likelihood that the ARR was actually as high as 10% was approximately 50%. By comparison the likelihood of a more meager 4% ARR was 86%, and a 2% reduction was as high as 92%. When the strongly enthusiastic model was utilized, the posterior probabilities increased accordingly. Conversely, when a strongly skeptical model was utilized, less optimistic posterior probabilities resulted.
Goligher et al performed a fairly rigorous Bayesian analysis of the EOLIA trial. The extent to which they derived multiple models representing a variety of prior probabilities of benefit for ECMO in ARDS bolsters their results. Despite this rigor, there is a danger to such an analysis, in that induces a false sense of statistical legitimacy that is not always justified. Frequentist or Bayesian, it is not uncommon for statistically robust results to later be disproven when larger more rigorous studies are performed, due to the true prior probability of benefit being far lower than what was estimated.
In 2008, Kalil et al8 performed a Bayesian analysis of five therapies: intensive insulin therapy, recombinant human activated protein C, low-tidal volume, low-dose steroid, early goal-directed therapy [EGDT], which at the time were thought to be beneficial therapies based on sound physiological reasoning and a single statistically robust trial. Since the publication of this analysis, four of the five therapies analyzed have been scientifically disproven when larger, more methodologically robust studies were performed. Unlike the EOLIA results, all but one of these therapies demonstrated impressively low p-values when the studies were analyzed using frequentist methods.
When Kalil et al utilized optimistic prior probabilities similar to those used in the EOLIA reanalysis, the posterior probability of each of the four now disproven therapies appeared to be beneficial. For example, the authors examined the benefit of early goal directed therapy (EGDT), following the publication of the landmark trial by River et al in NEJM9. When the authors utilized a minimally informative reference prior, they found fairly positive results. The probability that EGDT reduced mortality by 5% was 94%. It was only when the authors devised prior models that were far more skeptical than any of the models utilized in the EOLIA reanalysis that a much more realistic view of EGDT’s potential benefits became clear, and the likelihood of benefit fell as low as 0.4%. Similar findings were observed for intensive insulin therapy, recombinant human activated protein, and low-dose steroids in sepsis.
The intention of this post is not to dissuade the use of Bayesian statistics but, rather to serve as a reminder that although the complexity of such analyses provide a specter of objectivity, the results are strongly tied to subjectively derived prior probabilities. Given this, one cannot help but ask, despite their methodological and statistical intricacies do formal Bayesian analyses reveal a more accurate truth than the one we derived using a qualitative Bayesian assessment of the same literature? I imagine at times they reveal a signal that an unstructured assessment would have failed to detect. But they also may lead us astray, in their attempts to dampen statistical noise and random error, they augment the confounding effects of non-random error.