EMNerd-The Case of the Differing Perspectives

The reversal of medical truth is not an uncommon occurrence in today’s world of gullible frequentist methodology. Upon examination of a decade’s worth of publications from the NEJM, Prasad et al found that of the trials examining an established medical practice, 40.2% contradicted the initial data responsible for these medical beliefs. This in part is due to our low threshold to change practice based off a single statistically significant p-value. A finding that in retrospect is just as likely to have occurred by random chance as it is due to a spectacular medical breakthrough (1).

In 2013, Anderson et al published a trial in the NEJM examining the use of aggressive blood pressure control in the management of intracerebral hemorrhage (ICH) (2). Their results, widely cited as positive, bolstered the already common practice of early aggressive blood pressure management in patients presenting with ICH. And yet, despite this enthusiasm, the reality of their results was far less positive than general opinion would have you believe. In light of the recent negative results of the ATACH-2 trial, published online in the NEJM on June 8^th 2016 (3), it is appropriate to again evaluate the utility of this resource intensive management strategy.

Qreshi et al randomized 1000 adult patients presenting with ICH within 4.5 hours of symptom onset with a GCS > 5, and a systolic blood pressure of greater than 180 mm Hg, to either aggressive blood pressure management (110-139 mm HG) or a more liberal approach (140-170 mm Hg). The trial originally planned to enroll 1280 patients but was stopped early, because of futility. At their second pre-planned interim analysis, the authors found no difference in their primary endpoint, the number of patients dead or disabled (mRS at 4-6) at 90-days. Patients randomized to the aggressive strategy were found to have a 90-day rate of death or disability of 38.7% compared to 37.7% in patients randomized to the more liberal strategy. Mortality was 6.6% and 6.8% in the two groups respectively. There was no difference in the EQ-5D scale, a measurement tool to assess health status and well being at 90-days. Treatment related adverse events were similar between the two groups (1.7% and 1.2% respectively). And while the 90-day adverse event rate was significantly higher in the patients who received aggressive blood pressure management (25.6% vs 20%; relative risk, P = .05), this was primarily due to an increase in clinically inconsequential changes in renal function.

How do we reconcile these negative results in the context of the positive findings presented by the INTERACT-2 authors? Do the results of ATACH-2 represent medical reversal? I think not. While the authors of ATACH-2 present an explanation based on physiologic reasoning, I suspect the truth is far simpler. These two articles are examining the very same truth from two very different perspectives.

In INTERACT-2, Anderson et al randomized 2794 patients with ICH within 6-hours of symptom onset to either aggressive management (< 140 mm Hg) or a more lenient goal of less than 180 mm Hg. Despite the positive claims related to this trial, the authors found no statistical benefit in their primary endpoint, the rate of death or disability (mRS or 3, 4, 5, or 6) at 90-days. 52% of the patients in the aggressive management group had an mRS of 3 or greater compared to 55.3% in the control group (p-value of 0.06). The authors also found no difference in 90-day mortality (11.9% and 12% respectively). Even the assessment of the European Quality of Life–5 Dimensions (EQ-5D) questionnaire, though statistically different (0.60±0.39vs 0.55±0.40 p-value of 0.002) was below what is considered the minimum clinically important difference (4). It was not until the authors performed an ordinal analysis that their statistical endeavors proved fruitful. Using this form of statistical chicanery, the authors found a shift towards statistically lower mRS scores in the patients randomized to aggressive blood pressure control when compared to the control group.

We have discussed the use of ordinal analyses in the past, but essentially it is an attempt to examine how an intervention influences functional neurologic outcomes over the entire range of mRS scores. Some authors argue in favor of such an analysis rather than the dichotomous outcome we are more accustom to, citing that it is a more sensitive assessment of efficacy (5,6). The contention being that a shift in mRS scores from 2 to 0, or from 5 to 3 would go unnoticed in a dichotomous scale whereas if an ordinal analysis is used, these improvements would theoretically be detected. And while this logic seems reasonable, conducting such analyses can prove problematic.

Ordinal analyses and the statistical tools used to carry them out assume a uniformity of the treatment effect across the entirety of the scale. More importantly it assumes a granularity to the data that is not possible in the clinical arena. Essentially by conducting an ordinal analysis, the authors are assuming the functional assessments performed at 90-days were infallible, despite multiple studies demonstrating the unreliability of the mRS (7,8,9). Two clinicians when assessing the same patient will often disagree in scores they assign by one or more rank on the mRS. But this unreliability is not uniform. Clinicians reliably assess patients with functional statuses on either extreme of the range. Patients with little or no functional limitations will consistently be rated an mRS of 0 or one by multiple clinicians. Likewise, patients with severe disability will consistently receive mRS scores of 5. Where the reliability of the mRS falters is in its middle range (mRS of 2,3,or 4), the very region where the ordinal analysis hopes to differentiate small shifts in neurological function. And so what you are left with is a secondary analysis of a subjective outcome score, in an open label trial with a tool that is highly unreliable. The potential confounders are numerous and to think that you can measure neurological outcome with enough accuracy to perform a shift analysis is at best naive and potentially misleading. What appeals to most authors about an ordinal analysis is its statistical powers of augmentation. While it is unable to differentiate the small difference in functional outcomes its supporters claim, it most certainly is capable of finding statistical significance in an otherwise negative trial.

When you exclude the ill-gotten gains of INTERACT-2’s secondary analysis what you are left with is a negative trial. ATACH-2 only serves to validate these results. As it stands we have two large trials, both demonstrating that aggressive blood pressure management offers little clinical utility in patients presenting with ICH. Despite these consistently negative results, I fear ATACH-2 will prove of little influence to general practice. Those that believe the secondary findings from INTERACT-2 will argue the ATACH cohort is too underpowered to detect the 3% absolute deference. Forgetting that without the help of an ordinal analysis, the more robust INTERACT-2 cohort failed to demonstrate a statistically significant benefit.

Sources Cited:

Prasad V, Vandross A, Toomey C, et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc. 2013;88(8):790-8.
Anderson CS, Heeley E, Huang Y, et al. Rapid blood-pressure lowering in patients with acute intracerebral hemorrhage. N Engl J Med. 2013;368(25):2355-65.
Qreshi et al. Intensive Blood-Pressure Lowering in Patients with Acute Cerebral Hemorrhage. N Engl J Med. 2016
Pickard AS, Neary MP, Cella D. Estimation of minimally important differences in EQ-5D utility and VAS scores in cancer. Health Qual Life Outcomes. 2007;5:70.
Bath PM, Gray LJ, Collier T, Pocock S, Carpenter J. Can we improve the statistical analysis of stroke trials? Statistical reanalysis of functional outcomes in stroke trials. Stroke. 2007;38(6):1911–5.
Saver JL, Gornbein J. Treatment effects for which shift or binary analyses are advantageous in acute stroke trials. Neurology. 2009;72(15):1310–5
van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J. Inter-observer agreement for the assessment of handicap in stroke patients. Stroke. 1988;19(5):604–7.
Wilson JT, Hareendran A, Grant M, Baird T, Schulz UG, Muir KW, et al. Improving the assessment of outcomes in stroke: use of a structured interview to assign grades on the modified Rankin scale. Stroke. 2002;33(9):2243–6.
Wilson JT, Hareendran A, Hendry A, Potter J, Bone I, Muir KW. Reliability of the modified Rankin scale across multiple raters: benefits of a structured interview. Stroke. 2005;36(4):777–81.