PulmCrit - I’m upset about the RELAX trial

Some studies have reported a gradual upwards trend in the PEEP being used for intubated patients without ARDS (e.g., from 5 cm to 7 cm). Is this good or bad? That is the question which the RELAX trial embarked upon answering. It’s a complex question which probably doesn’t have a binary answer… but never mind – let’s get to the study.

setting the stage – patients & interventions

This is a multi-center, open-label RCT which randomized 969 patients who were intubated shortly before or after ICU admission. Some notable exclusion criteria were ARDS, severe COPD, restrictive lung disease, morbid obesity, or neurologic diagnosis which could prolong the duration of mechanical ventilation.

Patient characteristics are shown above. A few points are notable:

Only about 30% of patients were intubated because of respiratory failure. Most patients were intubated for other reasons (e.g., airway protection, planned postoperative ventilation, or cardiac arrest).
There is some imbalance between the groups. More patients in the low-PEEP group had planned postoperative ventilation (78 vs. 59). Alternatively, more patients in the high-PEEP group had cardiac arrest (142 vs. 123). Both of these imbalances will tend to cause patients in the low-PEEP group to fare better.

Patients were randomized to a low-PEEP strategy (which involved minimizing the PEEP as much as possible, with a target of zero PEEP) versus a high-PEEP strategy (which largely involved letting patients cruise at a PEEP of 8 cm):

The effect of this intervention on PEEP is shown below. The low-PEEP group had average PEEPs of ~4 cm, whereas the high-PEEP group had an average PEEP of ~8 cm. There was more variation in PEEP over time in the low-PEEP group, since the low-PEEP protocol involved a greater intensity of PEEP titration.

overall design & primary endpoint

Now that you have a sense of the study, let’s talk about the overall study design and the primary endpoint.

The primary endpoint is the number of ventilator-free days. This doesn’t really make sense. Most patients were not intubated due to respiratory failure. Do we truly expect that a PEEP of 4 cm vs. 8 cm will affect timing of extubation in a postoperative patient, or a patient intubated for airway protection? Most patients in the study weren’t intubated for respiratory failure – so their timing of extubation is likely to be determined by non-pulmonary considerations. So a priori, it is extremely likely that the primary endpoint will be neutral (i.e., small tweaks in the PEEP won’t affect patient outcome).

In order to generate a “positive” study using this endpoint, the authors used a noninferiority design. The purpose of a noninferiority design is to test whether a new treatment is “not unacceptably worse” than the current therapy. For example, a noninferiority design could make sense to test a cheaper or safer drug – a drug with obvious tangible benefits that would make it an attractive choice if it achieved equivalent outcomes compared to standard therapy. However, a noninferiority design doesn’t make sense for comparing two PEEP levels. There is nothing cheaper, easier, or inherently better about using less PEEP – so even if lower PEEP is genuinely noninferior, that shouldn’t change practice.

By pairing a primary endpoint that is unlikely to be affected by the intervention (ventilator-free days in patients who mostly don’t have ventilatory failure) with a noninferiority design, the authors designed a trial which was extremely likely to “succeed.”

results: primary endpoint

As might be expected, different PEEP levels didn’t affect the number of ventilator-free days. In the primary analysis, there was a trend towards more ventilator-free days in the low-PEEP group:

However, if we analyze only patients who survived, there was a trend towards earlier extubation in the high-PEEP group (figure 4A below). This difference may reflect baseline imbalances, wherein patients in the high-PEEP group were sicker (which may have increased the mortality and thereby dragged down the number of ventilator-free days in the high-PEEP group). Regardless, these results are neutral – as expected.

results: secondary clinical endpoints

Patients in the low-PEEP group showed trends towards higher rates of atelectasis, hypoxemia, and the requirement for rescue therapy:

You might notice that all the p-values here are very close to 0.99. Why? The authors performed a Holm-Bonferroni correction for multiple testing. By using this correction and adding a lot of secondary endpoints, one can essentially guarantee that none of the secondary endpoints will be statistically significant. Ironically, this is the opposite of most studies – which attempt to achieve superiority by ignoring correction for multiple testing!

Where does the truth lie? Perhaps somewhere in the middle. I suspect that the trend towards requiring rescue therapy in the low-PEEP group is a real finding, reflecting derecruitment. However, patients in this study often didn’t have lung disease, so the study was underpowered to evaluate this (with low rates of pulmonary deterioration in both groups).

results: secondary physiological endpoints

Some of the most revealing results are hidden in the supplemental appendix. Patients in the low-PEEP arm experienced lower PaO2/FiO2 ratios and higher driving pressure (figure below). This provides direct physiological evidence that the low-PEEP strategy promoted derecruitment.

practice misalignment

Any commentary on this study would be lacking without a discussion of practice misalignment.

Practice misalignment occurs when a study attempts to create two different subgroups which are distinctly separate, and in so doing investigates treatments which don’t reflect clinical reality. For example, imagine designing a transfusion study which compares a transfusion target of >5 mg/dL hemoglobin versus >12 mg/dL hemoglobin. Such a study is likely to detect differences between the two groups, because management is extremely different. The study is likely to find statistically significant differences, which may appear exciting and render the trial strongly “positive.” Unfortunately, the study doesn't reflect clinical reality – because the treatment arms don't reflect actual clinical practice.

In terms of this study, the low-PEEP arm seems bizarre and unrealistic. Using zero PEEP (“ZEEP”) is likely to promote atelectotrauma, so ZEEP is generally avoided in critical care practice. Likewise, 8 cm of PEEP is occasionally used, but this doesn’t seem like usual care either (at least at units that I've worked in). As such, neither arm of this trial resembles standard practice.

A more realistic and clinically useful comparison would be between 5 cm PEEP vs. 8 cm of PEEP. However, there is less separation between these two groups, so performing a study would be challenging.

my overall interpretation of the data

There were no statistically significant differences in any clinical endpoint between the high-PEEP versus low-PEEP therapies. That’s a fact, so it may be a reasonable place to begin. Based on that fact, the study cannot establish the superiority of either treatment approach.

Lack of differences in mortality or ventilator-free days should come as no surprise. There is no reason to expect that minor ventilator adjustments should affect these endpoints, among a population of patients who were mostly intubated for non-pulmonary indications.

Clinical endpoints provide some signals suggesting increased rates of hypoxemia and atelectasis among patients in the low-PEEP group. When combined with on secondary physiological endpoints (e.g. higher driving pressure in the low-PEEP group), it seems reasonably clear that a low-PEEP strategy promotes derecruitment.

Thus, this study won’t affect my practice. Using 5-8 cm of PEEP seems reasonable for most ICU patients (e.g., with higher PEEPs in morbid obesity) and is consistent with normative practice. More robust evidence would be required to shift practice towards a low-PEEP strategy (especially a strategy targeting zero PEEP – which would represent a radical break from current ventilator practices).

where this paper really goes off the rails

This paper concludes that the data supports the use of lower PEEP:

Although the study did find low-PEEP to be noninferior to high-PEEP, this finding has numerous limitations:

Patients in the study generally didn’t tolerate zero PEEP (ZEEP). Thus, patients spent relatively little time at extremely low PEEPs. Therefore, the study shouldn’t be misconstrued as evidentiary support for placing patients on zero PEEP for long periods of time.
Most patients in the study were intubated for non-pulmonary reasons, which may have caused the study to be underpowered to detect subtle impairment in lung function. For example, if the study were repeated with 1,000 patients intubated for pneumonia, the results might look quite different.
Noninferiority was not established among several subgroups of patients (figure below). In particular, patients intubated for respiratory failure showed a trend towards benefiting from higher PEEP. Furthermore, patients with morbid obesity were excluded entirely from the study.

Current conventional practice of generally using a PEEP of ~5 cm seems reasonable among ventilated patients without substantial respiratory failure. Higher PEEPs (e.g., 8 cm) could be useful in some patients (e.g., with morbid obesity or atelectasis).
This trial should not change practice. In particular, this trial should not be misconstrued as evidence that using zero PEEP (ZEEP) is safe.

Image credits: Photo by 傅甬华 on Unsplash