EM Nerd-A Case of Confounding Factors

So much of our critique of the medical literature is focused on the quantification of random error. Attempting to differentiate a true divergence of sample populations from random chance. Typically, we assess this probability via the p-value. This number represents the likelihood that the results observed, or a more extreme version, would occur if no true difference existed between the two sample populations. Essentially the p-value seeks to quantify the degree of random error present in any given sample. And while this is an important component of any statistical analysis, it often overshadows a far more important type of error, non-random error or bias.

Non-random error, or bias, are confounders that systematically drive the observed results away from the underlying reality. Essentially a multitude of factors such as age, comorbid conditions, severity of illness, and countless others influence outcomes, sometimes even more than the variables we are attempting measure. These very same factors also influence when specific therapies are utilized. Such confounders make it incredibly difficult to isolate the therapeutic effects of any one treatment. The process of random assignment of subjects into treatment groups is the optimal method to control for these potential sources of bias. The hope is these confounders will be equally distributed between groups, leaving the therapy in question as the lone difference between groups (1). It is in this manner that we use randomization to isolate a treatment effect. But in some cases randomization proves logistically difficult. In these circumstances, authors often utilize statistical methods in an attempt to control for potential sources of bias.

In a recent article published in The Journal of Trauma and Acute Care Surgery, Inoue et al examined the efficacy of resuscitative endovascular balloon occlusion of the aorta (REBOA) in patients with hemodynamically unstable torso trauma (2). The authors conducted a retrospective analysis of the Japan Trauma Data Bank (JTDB) with data from 234 hospitals, on patients who underwent emergency surgery or interventional embolization due to injuries to their chest, abdomen or pelvis. Using this compiled dataset, the authors compared patients who received REBOA as a hemodynamic stopgap prior to definitive care, to those that did not. In contrast to previous data (3), these authors observed a far higher in-hospital mortality in the patients who received REBOA when compared to controls (61.8% vs 45.3%). Some have offered that this increase in mortality may have been due to the prolonged time-to-definitive care (median of 97 minutes), observed in the patients in the REBOA group leading to ischemic complications associated with vascular occlusion. And while this may be the case, I suspect there is a far simpler explanation.

It is important to remember this was a retrospective examination of patients who underwent REBOA compared to matched controls. No randomization was performed to determine who did and did not receive the treatment in question. Such a design allows for the potential of non-random error to shape findings. The reason patients who received REBOA did worse may have been because of the harmful effects of REBOA, but it is equally likely to have been caused by a multitude of other factors. In fact, the unadjusted comparison of the two samples reveals the patients who received REBOA were significantly sicker than their counterparts, having a higher HR, RR, and ISS and a lower BP and GCS on arrival. This intuitively makes sense as there should be a certain degree of hemodynamic instability prior to the insertion of such an endovascular hemostatic device. In an attempt to adjust for these confounders, the authors performed what is know as a propensity score analysis.

A propensity score analysis attempts to quantify how baseline features are associated with the treatment allocation in question, in the hopes that such a technique will eliminate all possible imbalances between the groups when subjects with similar propensity scores are compared (4). This involves first identifying the variables that predict how likely a patient is to receive the treatment in question (in this case REBOA). After which, a model can be derived with increasing scores representing an individual’s increasing likelihood of receiving REBOA. Once individual scores have been generated for all patients in the dataset, patients with similar propensity scores who did and did not receive REBOA can be compared. The assumption is similar propensity scores will eliminate the risk of bias. Patients with scores that are without a matching control are discarded from the analysis.

Inoue et al started with a total of 13,780 patients. Of these, 634 underwent REBOA during their resuscitation. After performing a propensity analysis which accounted for 70 different variables, the authors were left with 624 patients who received REBOA and 624 matched pairs who did not undergo endovascular resuscitative efforts. Even with this statistical balancing act, the authors observed a 16.5% increase of in-hospital mortality in the patients who underwent REBOA prior to definitive care. And while this propensity matched cohort appears more balanced (similar vitals, GCS and ISS on presentation), than the original unfiltered dataset, such a statistical manipulation may not get us closer to the underlying truth.

It is generally accepted that propensity score analyses, no matter the statistical complexity, cannot duplicate the powers of simple randomization. At very best they are only capable of controlling for confounders which are measured in the original dataset and included in the derived risk model. As such propensity scores become less reliable the more complex the clinical question becomes. For simple questions that rely on data that can be easily collected from vital signs and laboratory values, propensity scores perform admirably. But in complex clinical situations which rely heavily on the unstructured judgment of the treating physician their performance suffers. Such is the case when examining an intervention like REBOA. The Inoue et al trial suffers from is what is known as confounding by indication. This occurs when the allocation of a treatment in an observational study is not randomized and the indication for treatment is highly tied to downstream outcomes. In this case, the probability of a patient receiving REBOA was likely influenced by the patient’s initial appearance and their response to early resuscitative efforts. Critically ill patients who hemodynamically decompensate are typically the patients in whom REBOA will be utilized. These are the very same patients who will have poor outcomes based simply on their severity of injury. Inoue et al observed a 7.3% absolute increase in Emergency Department mortality in patients who underwent the endovascular procedure (17.1% vs 9.7%), compared to controls. While such an increase could be due to complications of inserting the device, this seems unlikely considering the ischemic complications associated with its use are generally seen downstream. It is far more likely this increase in early mortality was due to a baseline imbalance in the acuity of the initial cohorts. Thus the reason a patient received REBOA is the very same reason a patient had a poor outcome, REBOA is only done on severely sick patients who are refractory to more conservative measures.

And so the outcomes observed in any non-randomized trial has as much to do with the patient selection as it does with the treatment effect in question. Take for example a study published by Moore et al, in The Journal of Trauma and Acute Care Surgery in 2015 (3). The authors examined the use of REBOA as compared to resuscitative thoracotomy (RT) in patients with non-compressible torso trauma. These authors found that patients who received REBOA had a significantly higher rate of survival (37.5%) when compared to those who received an RT (9.7%). Additionally, the authors noted a far higher portion of patients in the REBOA group were discharged home (77.8%). Compared to those who received RT (only 28.5%), in which the vast majority were discharged to either a rehabilitation hospital (57.1%) or skilled nursing facility (14.4%). Similar to Inoue et al, this study did not utilize randomization to determine group assignment. As such, it is subject to the same potential biases and can be equally misleading in its estimation of the benefits and harms of REBOA. Since patients who received RT are generally a far sicker cohort it is again difficult to isolate the benefits of REBOA.

REBOA is still an unknown commodity. Despite its statistical complexity, the Inoue et al dataset has done little to identify the exact patient population who might benefit from this endovascular stop gap. And while comparing it to RT is equally misleading, this contrast highlights the influence a comparator group has on the results found in non-randomized observational data. The Inoue et al data highlights a fact with which we are far too familiar, sick trauma patients will die despite our best efforts. Even with their heroic statistical efforts, the question of whether and to what degree REBOA lessens this burden, remains unanswered.

Sources Cited:

Altman DG, Bland JM. Uncertainty beyond sampling error. BMJ. 2014;349:g7065.
Inoue J, Shiraishi A, Yoshiyuki A, Haruta K, Matsui H, Otomo Y. Resuscitative endovascular balloon occlusion of the aorta might be dangerous in patients with severe torso trauma: A propensity score analysis. J Trauma Acute Care Surg. 2016;80(4):559-66.
Moore LJ, Brenner M, Kozar RA, et al. Implementation of resuscitative endovascular balloon occlusion of the aorta as an alternative to resuscitative thoracotomy for noncompressible truncal hemorrhage. J Trauma Acute Care Surg. 2015;79(4):523-30.
Freemantle N, Marston L, Walters K, Wood J, Reynolds MR, Petersen I. Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research. BMJ. 2013;347:f6409.