“There is no mortality benefit for that.” How many times have you heard that? The implication is usually the same: that intervention is a waste of time. A smart, evidence-based clinician wouldn't bother with it. But, what does it actually mean if there is no proven mortality benefit?
Mortality benefit is elusive for several reasons
Several factors conspire to make it nearly impossible to prove mortality benefit in critical care:
#1. Mortality is decreasing.
Baseline mortality rates fall over time. This makes it increasingly difficult to prove that any intervention works. For example, imagine a drug that reduces relative risk of mortality by 25%:
- If the baseline mortality rate is 60%, then the drug should decrease mortality from 60% to 45%. Powering a study to detect a 15% absolute mortality difference shouldn't be that difficult.
- If the baseline mortality rate is 20%, then the drug would be expected to decrease mortality from 20% to 15%. Powering a study to detect a 5% absolute mortality difference requires a much larger sample size.
When designing a clinical trial, statisticians perform a power calculation to estimate how many patients they must recruit. Such calculations are based on previously reported mortality rates. Since mortality rates drop over time, the actual mortality in the study is usually below the expected rate. This causes many studies to be under-powered.
#2. Most patients are unlikely to see any change in mortality.
We all go into critical care to save lives. However, actually saving lives isn't particularly common. Patients being admitted to an ICU may be broken into roughly three groups:
- Likely-to-die: Some patients have numerous comorbidities and a very high severity of illness. Their mortality is very high, regardless of our intervention.
- Unlikely-to-die: Many patients are fairly healthy, with lower illness severity. As long as they get decent care, they will survive. Outstanding care will get these patients better faster, with fewer complications (but it cannot improve their mortality).
- Borderline: Patients at intermediate risk of death. Differences in care might affect their survival.
Within any study, there will be lots of patients who are either likely-to-die or unlikely-to-die. These patients contribute noise, because the intervention won't likely affect their outcome. Only the borderline patients are able to provide meaningful information.
#3. Patients die for numerous reasons.
From a physiologic perspective, mortality is a heterogeneous, composite outcome. For example, patients may die from a myocardial infarction for different reasons:
- Malignant arrhythmia
- Cardiogenic shock (pump failure)
- Infectious complication
- Hemorrhagic complication
Imagine we are trialing a drug that reduces malignant arrhythmia. Even if this drug is 100% effective at preventing arrhythmia, it would only be able to prevent a fraction of deaths. Inability to affect most causes of death could make it hard for this drug to have any measurable impact on all-cause mortality.
#4. We are desperately trying to keep patients alive.
Performing an animal study with a mortality outcome is simple. Injure the animals in some way; for example, introduce an infection. Perform an intervention on half of the animals. Stand back. Watch how many animals die in each group.
A clinical trial is infinitely more complex. Besides the intervention being studied, clinicians are trying furiously to keep the patients alive. Clinical management may negate the effects of the study intervention. For example, imagine a study comparing Plasmalyte versus saline. If clinicians are very diligent about treating hyperchloremic acidosis, this could negate differences observed between saline and Plasmalyte (Vincent 2016).
#5. The intervention is delivered too late to affect outcomes.
Early intervention is important for critically ill patients. Unfortunately, early intervention is difficult within a RCT. By the time patients have been recruited, consented, and randomized it is usually late in the disease process (often >24 hours after admission). If a good intervention is delivered too late, it won't work.
#6. Many conditions are too rare to study.
Recruiting enough patients to show mortality benefit requires a big study. There are many rare conditions for which this is simply not feasible (e.g. toxic shock syndrome). The entire field of critical care toxicology is filled with heterogeneous and rare presentations, which are nearly impossible to study with a large RCT.
What interventions do have proven mortality benefit?
The above factors predict that it's nearly impossible to prove mortality benefit in critical care. What does the literature show? The great majority of RCTs with a mortality endpoint are negative. Ospina-Tascon 2008, Ridgeon 2016, and Landoni 2015 sifted through decades of critical care literature looking for multicenter RCTs showing a mortality benefit. Based on these studies, below is a list of medications with mortality benefit (1):
- Smaller studies, with fragility index <5 and p-value around 0.01-0.05. Some of these studies were potentially positive due to random chance. Although p<0.05 is technically “significant,” studies with borderline statistical significance often cannot be reproduced. Considering how many studies have been performed in total, some will be “positive” simply due to random chance.
- Massive studies with fragility index >5 and p-values <0.01. These studies are more convincing and less likely false-positives.
This is a short list. The vast majority of medical interventions in critical care haven't been shown to improve mortality. This includes fundamental interventions we rely upon daily (e.g. vasopressors, blood transfusion, fluid resuscitation for hypovolemia, antibiotics). Therefore, it's naïve to propose that we shouldn't use an intervention because it hasn't been proven to improve mortality.
Many critical care trials are designed to look for a mortality benefit with a target p-value <0.05. This is a recipe for confusion:
- If the trial is positive, it usually winds up having a moderately positive p-value (e.g., 0.02-0.05) with a low fragility index (Ridgeon 2016). Although technically a “positive” trial, these results are not robust – they might represent chance alone. Shooting for a p-value <0.05 is major cause of poor replicability. Some authors have proposed targeting a lower p-value (e.g. p<0.005) or a higher fragility index in order to improve reproducibility (Johnson 2013, Ridgeon 2016).
- If the trial is negative, this doesn't rule out a meaningful mortality benefit. Mortality is a profoundly important outcome, so even small differences in mortality are important (e.g. 1-5% difference). Unfortunately, most trials lack sufficient power to confidently exclude a 10% mortality benefit (Harhay 2014). Investigators often predict that their intervention will cause an unrealistically large improvement in mortality, which leads to their studies being small and underpowered (a mistake known as “delta inflation”)(Ridgeon 2017).
In short, nearly all studies are underpowered to definitively address mortality. They are doomed from inception to either be weakly positive or weakly negative, failing to answer the intended question. This spawns meta-analyses, which attempt to combine underpowered studies – often with conflicting and indecisive results as well.
Many therapies for sepsis have been rejected on the basis of a lack of mortality benefit. It's likely that some of these therapies have benefit, which couldn't be proven for reasons explored above. For example, an IL-1 receptor antagonist was shown to reduce mortality by 3%, but this intervention was rejected because the difference wasn't statistically significant (Opal 1997). A 3% absolute reduction in mortality would be clinically meaningful, but this study was underpowered to determine whether this was statistically significant (2).
The only trials for which a mortality endpoint could make sense are cardiology mega-trials or massive, pragmatic trials involving several thousand patients (e.g. CRASH-2). These studies have enough power to robustly investigate a mortality endpoint. Unfortunately, this sort of trial is rarely achieved in critical care.
More proximal endpoints may offer greater clarity.
The solution to this problem with mortality endpoints is to choose an endpoint that is more proximally related to the intervention (3). For example, in a study of ventilator weaning, ventilator-free days is more closely related to the intervention. Compared to mortality, ventilator-free days is more likely to produce a clear result:
- Extubation is much more common than death (e.g. 75% of patients may get extubated, whereas 15% of patients may die). This allows the investigator to analyze a larger signal from the same number of patients.
- Ventilator-free days is a continuous variable, rather than a binary variable (dead/alive). This provides more granular detail about the outcome over time, which will typically improve statistical power.
- Ventilator-free days is more closely related to the intervention, so there are fewer sources of noise interposed between the intervention and the endpoint.
A bit of perspective might help here. Non-mortality endpoints are uniformly accepted outside of the ICU (where a mortality endpoint is often impossible). However, among critically ill patients, non-mortality endpoints are often derided. This doesn't make sense. Just because the patient is in the ICU doesn't mean that everything other than survival suddenly ceases to matter.
Paradox: Mortality endpoint vs. proximal endpoint
- A proximal endpoint (e.g. ventilator-free days) is easier to investigate definitively, but it is less important to the patient.
- A mortality endpoint is more important to the patient, but it is often impossible to test definitively.
There is no simple answer to this riddle. If the study is underpowered, then it may be considered “scientifically useless” and potentially unethical (Halpern 2002)(4). Thus, it may be preferable to design an adequately powered study that definitively clarifies the effect on a proximal outcome, because at least this answers one question (rather than designing an underpowered study regarding mortality that doesn't answer any question).
The future of critical care: Focusing more on soft outcomes?
As critical care evolves, our goals mature beyond merely keeping patients alive. For example, if a patient with septic shock survives but develops end-stage renal failure, that's not a great outcome. The family may be thrilled that the patient survived, but I'm not – my goal is return of all organ functions. With ongoing progress, there will be a greater focus on “soft” outcomes such as:
- Avoidance of chronic renal insufficiency, heart failure, or pulmonary limitation
- Avoidance of delirium, depression, PTSD, or long-term cognitive dysfunction
- Improved strength, increased discharge to home, increased return to work
- More vent-free days and ICU-free days (surrogates for ICU-related morbidity)
These outcomes are hugely important to patients. None of them involves mortality. A narrow-minded focus solely on mortality ignores the benefits that we can offer our patients by improving these outcomes.
- Proving mortality benefit in critical care RCTs is extremely difficult for many reasons (e.g., falling baseline mortality rate, patient heterogeneity, heterogeneous causes of death, delayed initiation of study intervention, rarity of many conditions).
- Mortality benefit proven in double-blind RCT exists for only a handful of critical care interventions. The vast majority of interventions that we use every day have no proven mortality benefit.
- Most studies aren't powered well enough to definitively prove or disprove a mortality benefit (e.g. achieve fragility index >5). The traditional approach of designing studies to look for mortality benefit with a target p-value of <0.05 is a formula for generating weak, poorly replicable studies.
- Proximal endpoints (e.g. ventilator-free days) are easier to investigate definitively, although they may be less meaningful than mortality.
- Petros AJ et al. Should morbidity replace mortality as an endpoint for clinical trials in intensive care? Lancet 1995.
- Ospina-Tascon GA et al. Multicenter, randomized, controlled trials evaluating mortality in intensive care: Doomed to fail? Critical Care Med 2008
- Kress JP. Mortality is the only relevent outcome in ARDS: no. Intensive Care Med 2015.
- Ridgeon EE et al. Effect sizes in ongoing randomized controlled critical care trials. Critical Care 2017.
Acknowledgement: Thanks to Dr. Gilman Allen for thoughtful comments on this post.
- This list was generated by starting with interventions described in these papers, and then removing the following studies: studies involving devices (e.g. BiPAP), studies involving complex treatment regimens to which clinicians aren't blinded (e.g. intensive insulin, high tidal volume vs. low tidal volume), studies which have been disproven already, studies which used subgroup analysis, or studies irrelevant to current critical care. Annane 2018 was added, although I haven't performed an exhaustive review of studies over the last few years. I don't claim that this list is currently exhaustive, but it probably captures the majority of medical interventions supported by mortality benefit in multicenter RCTs.
- My personal bias is that IL-1 receptor antagonism would improve outcomes among a subset of patients with sepsis-HLH overlap syndrome.
- One caveat is that non-mortality endpoints must still be clinically meaningful. Pharma has an ignominious history of marketing drugs on the basis of non-patient-centered surrogate endpoints (e.g. hemoglobin A1C).
- The value of such trials hinge on how we interpret secondary endpoints, which is another topic for another blog. In short: if we insist that the only endpoint of any value is the primary endpoint, then an underpowered study is worthless and unethical. However, if we allow careful use of secondary endpoints, then the study may be quite useful (i.e., the secondary endpoints may allow an underpowered study to provide useful information).
- IBCC chapter:Guide to APRV for COVID-19 - April 8, 2020
- PulmCrit Theoretical Post – The COVID Severity Index (CSI 1.0) - April 2, 2020
- PulmCrit wee – Why the SCCM/AARC/ASA/APSF/AACN/CHEST joint statement on split ventilators is wrong. - March 29, 2020