“There is no mortality benefit for that.” How many times have you heard that? The implication is usually the same: that intervention is a waste of time. A smart, evidence-based clinician wouldn't bother with it. But, what does it actually mean if there is no proven mortality benefit?
Mortality benefit is elusive for several reasons
Several factors conspire to make it nearly impossible to prove mortality benefit in critical care:
#1. Mortality is decreasing.
Baseline mortality rates fall over time. This makes it increasingly difficult to prove that any intervention works. For example, imagine a drug that reduces relative risk of mortality by 25%:
- If the baseline mortality rate is 60%, then the drug should decrease mortality from 60% to 45%. Powering a study to detect a 15% absolute mortality difference shouldn't be that difficult.
- If the baseline mortality rate is 20%, then the drug would be expected to decrease mortality from 20% to 15%. Powering a study to detect a 5% absolute mortality difference requires a much larger sample size.
When designing a clinical trial, statisticians perform a power calculation to estimate how many patients they must recruit. Such calculations are based on previously reported mortality rates. Since mortality rates drop over time, the actual mortality in the study is usually below the expected rate. This causes many studies to be under-powered.
#2. Most patients are unlikely to see any change in mortality.
We all go into critical care to save lives. However, actually saving lives isn't particularly common. Patients being admitted to an ICU may be broken into roughly three groups:
- Likely-to-die: Some patients have numerous comorbidities and a very high severity of illness. Their mortality is very high, regardless of our intervention.
- Unlikely-to-die: Many patients are fairly healthy, with lower illness severity. As long as they get decent care, they will survive. Outstanding care will get these patients better faster, with fewer complications (but it cannot improve their mortality).
- Borderline: Patients at intermediate risk of death. Differences in care might affect their survival.
Within any study, there will be lots of patients who are either likely-to-die or unlikely-to-die. These patients contribute noise, because the intervention won't likely affect their outcome. Only the borderline patients are able to provide meaningful information.
#3. Patients die for numerous reasons.
From a physiologic perspective, mortality is a heterogeneous, composite outcome. For example, patients may die from a myocardial infarction for different reasons:
- Malignant arrhythmia
- Cardiogenic shock (pump failure)
- Infectious complication
- Hemorrhagic complication
Imagine we are trialing a drug that reduces malignant arrhythmia. Even if this drug is 100% effective at preventing arrhythmia, it would only be able to prevent a fraction of deaths. Inability to affect most causes of death could make it hard for this drug to have any measurable impact on all-cause mortality.
#4. We are desperately trying to keep patients alive.
Performing an animal study with a mortality outcome is simple. Injure the animals in some way; for example, introduce an infection. Perform an intervention on half of the animals. Stand back. Watch how many animals die in each group.
A clinical trial is infinitely more complex. Besides the intervention being studied, clinicians are trying furiously to keep the patients alive. Clinical management may negate the effects of the study intervention. For example, imagine a study comparing Plasmalyte versus saline. If clinicians are very diligent about treating hyperchloremic acidosis, this could negate differences observed between saline and Plasmalyte (Vincent 2016).
#5. The intervention is delivered too late to affect outcomes.
Early intervention is important for critically ill patients. Unfortunately, early intervention is difficult within a RCT. By the time patients have been recruited, consented, and randomized it is usually late in the disease process (often >24 hours after admission). If a good intervention is delivered too late, it won't work.
#6. Many conditions are too rare to study.
Recruiting enough patients to show mortality benefit requires a big study. There are many rare conditions for which this is simply not feasible (e.g. toxic shock syndrome). The entire field of critical care toxicology is filled with heterogeneous and rare presentations, which are nearly impossible to study with a large RCT.
What interventions do have proven mortality benefit?
The above factors predict that it's nearly impossible to prove mortality benefit in critical care. What does the literature show? The great majority of RCTs with a mortality endpoint are negative. Ospina-Tascon 2008, Ridgeon 2016, and Landoni 2015 sifted through decades of critical care literature looking for multicenter RCTs showing a mortality benefit. Based on these studies, below is a list of medications with mortality benefit (1):
These studies tend to fall into one of two groups:
- Smaller studies, with fragility index <5 and p-value around 0.01-0.05. Some of these studies were potentially positive due to random chance. Although p<0.05 is technically “significant,” studies with borderline statistical significance often cannot be reproduced. Considering how many studies have been performed in total, some will be “positive” simply due to random chance.
- Massive studies with fragility index >5 and p-values <0.01. These studies are more convincing and less likely false-positives.
This is a short list. The vast majority of medical interventions in critical care haven't been shown to improve mortality. This includes fundamental interventions we rely upon daily (e.g. vasopressors, blood transfusion, fluid resuscitation for hypovolemia, antibiotics). Therefore, it's naïve to propose that we shouldn't use an intervention because it hasn't been proven to improve mortality.
For nearly all studies, mortality is a foolish primary endpoint.
Many critical care trials are designed to look for a mortality benefit with a target p-value <0.05. This is a recipe for confusion:
- If the trial is positive, it usually winds up having a moderately positive p-value (e.g., 0.02-0.05) with a low fragility index (Ridgeon 2016). Although technically a “positive” trial, these results are not robust – they might represent chance alone. Shooting for a p-value <0.05 is major cause of poor replicability. Some authors have proposed targeting a lower p-value (e.g. p<0.005) or a higher fragility index in order to improve reproducibility (Johnson 2013, Ridgeon 2016).
- If the trial is negative, this doesn't rule out a meaningful mortality benefit. Mortality is a profoundly important outcome, so even small differences in mortality are important (e.g. 1-5% difference). Unfortunately, most trials lack sufficient power to confidently exclude a 10% mortality benefit (Harhay 2014). Investigators often predict that their intervention will cause an unrealistically large improvement in mortality, which leads to their studies being small and underpowered (a mistake known as “delta inflation”)(Ridgeon 2017).
In short, nearly all studies are underpowered to definitively address mortality. They are doomed from inception to either be weakly positive or weakly negative, failing to answer the intended question. This spawns meta-analyses, which attempt to combine underpowered studies – often with conflicting and indecisive results as well.
Many therapies for sepsis have been rejected on the basis of a lack of mortality benefit. It's likely that some of these therapies have benefit, which couldn't be proven for reasons explored above. For example, an IL-1 receptor antagonist was shown to reduce mortality by 3%, but this intervention was rejected because the difference wasn't statistically significant (Opal 1997). A 3% absolute reduction in mortality would be clinically meaningful, but this study was underpowered to determine whether this was statistically significant (2).
The only trials for which a mortality endpoint could make sense are cardiology mega-trials or massive, pragmatic trials involving several thousand patients (e.g. CRASH-2). These studies have enough power to robustly investigate a mortality endpoint. Unfortunately, this sort of trial is rarely achieved in critical care.
More proximal endpoints may offer greater clarity.
The solution to this problem with mortality endpoints is to choose an endpoint that is more proximally related to the intervention (3). For example, in a study of ventilator weaning, ventilator-free days is more closely related to the intervention. Compared to mortality, ventilator-free days is more likely to produce a clear result:
- Extubation is much more common than death (e.g. 75% of patients may get extubated, whereas 15% of patients may die). This allows the investigator to analyze a larger signal from the same number of patients.
- Ventilator-free days is a continuous variable, rather than a binary variable (dead/alive). This provides more granular detail about the outcome over time, which will typically improve statistical power.
- Ventilator-free days is more closely related to the intervention, so there are fewer sources of noise interposed between the intervention and the endpoint.
A bit of perspective might help here. Non-mortality endpoints are uniformly accepted outside of the ICU (where a mortality endpoint is often impossible). However, among critically ill patients, non-mortality endpoints are often derided. This doesn't make sense. Just because the patient is in the ICU doesn't mean that everything other than survival suddenly ceases to matter.
Paradox: Mortality endpoint vs. proximal endpoint
Investigators designing a RCT face a paradox:
- A proximal endpoint (e.g. ventilator-free days) is easier to investigate definitively, but it is less important to the patient.
- A mortality endpoint is more important to the patient, but it is often impossible to test definitively.
There is no simple answer to this riddle. If the study is underpowered, then it may be considered “scientifically useless” and potentially unethical (Halpern 2002)(4). Thus, it may be preferable to design an adequately powered study that definitively clarifies the effect on a proximal outcome, because at least this answers one question (rather than designing an underpowered study regarding mortality that doesn't answer any question).
The future of critical care: Focusing more on soft outcomes?
As critical care evolves, our goals mature beyond merely keeping patients alive. For example, if a patient with septic shock survives but develops end-stage renal failure, that's not a great outcome. The family may be thrilled that the patient survived, but I'm not – my goal is return of all organ functions. With ongoing progress, there will be a greater focus on “soft” outcomes such as:
- Avoidance of chronic renal insufficiency, heart failure, or pulmonary limitation
- Avoidance of delirium, depression, PTSD, or long-term cognitive dysfunction
- Improved strength, increased discharge to home, increased return to work
- More vent-free days and ICU-free days (surrogates for ICU-related morbidity)
These outcomes are hugely important to patients. None of them involves mortality. A narrow-minded focus solely on mortality ignores the benefits that we can offer our patients by improving these outcomes.
- Proving mortality benefit in critical care RCTs is extremely difficult for many reasons (e.g., falling baseline mortality rate, patient heterogeneity, heterogeneous causes of death, delayed initiation of study intervention, rarity of many conditions).
- Mortality benefit proven in double-blind RCT exists for only a handful of critical care interventions. The vast majority of interventions that we use every day have no proven mortality benefit.
- Most studies aren't powered well enough to definitively prove or disprove a mortality benefit (e.g. achieve fragility index >5). The traditional approach of designing studies to look for mortality benefit with a target p-value of <0.05 is a formula for generating weak, poorly replicable studies.
- Proximal endpoints (e.g. ventilator-free days) are easier to investigate definitively, although they may be less meaningful than mortality.
References:
- Petros AJ et al. Should morbidity replace mortality as an endpoint for clinical trials in intensive care? Lancet 1995.
- Ospina-Tascon GA et al. Multicenter, randomized, controlled trials evaluating mortality in intensive care: Doomed to fail? Critical Care Med 2008
- Kress JP. Mortality is the only relevent outcome in ARDS: no. Intensive Care Med 2015.
- Ridgeon EE et al. Effect sizes in ongoing randomized controlled critical care trials. Critical Care 2017.
Acknowledgement: Thanks to Dr. Gilman Allen for thoughtful comments on this post.
Notes
- This list was generated by starting with interventions described in these papers, and then removing the following studies: studies involving devices (e.g. BiPAP), studies involving complex treatment regimens to which clinicians aren't blinded (e.g. intensive insulin, high tidal volume vs. low tidal volume), studies which have been disproven already, studies which used subgroup analysis, or studies irrelevant to current critical care. Annane 2018 was added, although I haven't performed an exhaustive review of studies over the last few years. I don't claim that this list is currently exhaustive, but it probably captures the majority of medical interventions supported by mortality benefit in multicenter RCTs.
- My personal bias is that IL-1 receptor antagonism would improve outcomes among a subset of patients with sepsis-HLH overlap syndrome.
- One caveat is that non-mortality endpoints must still be clinically meaningful. Pharma has an ignominious history of marketing drugs on the basis of non-patient-centered surrogate endpoints (e.g. hemoglobin A1C).
- The value of such trials hinge on how we interpret secondary endpoints, which is another topic for another blog. In short: if we insist that the only endpoint of any value is the primary endpoint, then an underpowered study is worthless and unethical. However, if we allow careful use of secondary endpoints, then the study may be quite useful (i.e., the secondary endpoints may allow an underpowered study to provide useful information).
- PulmCrit Blogitorial – Use of ECGs for management of (sub)massive PE - March 24, 2024
- PulmCrit Wee: Propofol induced eyelid opening apraxia – the struggle is real - March 20, 2024
- PulmCrit wee: Why I like central lines for GI bleed resuscitation - March 13, 2024
Not sure if I am misreading this but in your chart of drugs that have mortality benefit it looks like the treatment group and control group were reversed in the hydrocortisone/septic shock trial?
Great post Josh and totally agree. The more resuscitative literature I read the more I need to ask myself is it appropriate to use a 90 day mortality bench mark as a reason not to use a therapy because of all the confounding outside of the unit. Lets face it the people we take care of in the intermediate to high risk group that you described become medically frail in the immediate aftermath. Even if we can guide them out of the unit safely many retrospective and prospective studies (albeit less) looking at frail patients show the importance of good… Read more »
yes you are right, thanks for finding that error! (I’ve fixed it)
Excellent post, Josh.
Unfortunately, ventilator-free days have their own flaws as an outcome measure. A recent CCM article is worth reading —> DOI: 10.1097/CCM.0000000000002890
Thanks. I agree, any outcome measure has its limitations. In particular, caution should be used when evaluating vent-free days if there is a trend towards increased mortality in the intervention group (this issue was noted by the original authors who developed this metric).
There may not be any single outcome measure which is perfect, but rather it may be optimal to consider a couple outcomes together (with consideration of clinical context and mechanism of action).
Hi Josh, great post as always. I think point #2 is key and is even more important than you make out. Heterogeneity of baseline risk is huge and not something we talk about enough. The obvious example is sepsis – urosepsis and respiratory sepsis are very different diseases and totally different prognoses, but are lumped together in severity scoring algorithms (e.g. APACHE II; and only partially corrected in APACHE III). This is also the case in the APROCCHSS* & ADRENAL trials – reporting very different outcomes in a different cohort of patients (APPROCCHSS ~60% chest source and 46% avg. mortality… Read more »
Thanks, I agree, heterogeneity is a huge issue. I was considering adding another number (#7) to discuss problems that we have with rather broad definitions (e.g. “ARDS” and “sepsis”). As one further example of this, consider the following two patients who might both be diagnosed with “sepsis” #1 – Young patient, previously healthy, with toxic shock syndrome. Hyperdynamic circulation, excellent catecholamine reserve, extremely elevated lactate, blood pressure maintained, very tachycardic. #2 – Elderly patient, chronic renal disease, baseline ejection fraction 20%. Develops shock following urinary tract infection. Hypodynamic circulation, poor catecholamine reserve, slightly elevated lactate, Bp profoundly low, heart rate… Read more »
Very timely post. Wondering if the impetus for the writing was the March 1 issue of NJOM that had two articles about Balanced fluids vs NS (Selmer & Self et al) ? … your thoughts seem applicable to them.
The most direct impetus was the ADRENAL and APROCCHSS trials (more to come about these trials in the next two posts). However, this post may be relevant to a lot of studies coming out in the near future. I am concerned that many of these studies with a primary endpoint of mortality will be “negative” – leading us to inappropriately disregard interventions that actually work.
Maybe an option is to use a disease oriented primary endpoint with mortality (or something else) as a secondary endpoint? The short term primary endpoint could be designed to try and shed light on the science behind the medicine, while the “farther out” secondary endpoint could capture data on how the two endpoints relate. A repeat of the ATHOS-3 trial could be structured with increased MAP as the primary (without the methodological gymnastics they used in the original) with mortality as a secondary. If the drug improves MAP (probably true) but fails to improve mortality, that sets you up to… Read more »
Agree that the primary endpoint for most trials should be something more proximal than mortality. This gets very tricky, though, because we also want endpoints that are clinically meaningful (ideally patient-centered etc). I don’t think MAP alone is a meaningful endpoint in ATHOS-3 (there are lots of ways to elevate the MAP, and few patients truly have refractory hypotension, so mere MAP elevation isn’t tremendously helpful). Theory and some post-hoc data suggests that Ang-II improves renal function. If studying Ang-II, I would be tempted to use something like MAKE30 as a primary endpoint (a composite of death, AKI, or dialysis).
excellent, Josh
thank you once again
Great post – it’s so important to have an ongoing conversation on how we use evidence to drive clinical decisions in critical care. In my experience, central to all this is the laziness of a “p-value dichotomy.” We think of studies as “positive or negative” based on a p-value, particularly for a mortality endpoint. If only the importance we assign to p-values could be shifted more towards focusing reproducibility, mechanistic plausibility, and effect sizes… An improvement in a proximal outcome with a trend towards mortality reduction is a decent indicator of efficacy, provided the confidence interval is narrow enough. Creating… Read more »
This is an interesting post. On a long enough timeline, the mortality of every patient rises to 100% despite any intervention; and in the critical care population specifically that timeline is much shorter. This line of thinking creates a hypothesis then, that CC literature is full of interventions that are thought of as ineffective but are in fact useful and this literature is then “ripe for the picking” for re-investigation of such interventions.
Interesting stuff. In a similar vein to another comment the issue in critical care is our inability to phenotype our patients that are put into clinical trials. Oncology trials do this and are comparatively much smaller yet not infrequently show mortality benefits. Perhaps we should take a step back and see how we can better describe our patients. We may find that some “disproven” therapies truly work in the right patients and undoubtedly a slew of new options would be found along the way.