EM Nerd-The Case of the Erratic Pendulum

In January 2015, the NEJM published the results of the MR CLEAN trial. Its findings sent ripples through the world of medicine (1), as the first RCT to demonstrate a benefit for endovascular therapy in acute ischemic stroke. In its wake 6 trials were left marooned, their enrollment halted prematurely due to the benefits demonstrated by Berkhemer et al. THRACE (2) marks the last of these trials which will be forever distorted by the publication of MR CLEAN, or as I like to call it The Trial to End All Trials.

Similar to EXTEND-IA, ESCAPE, SWIFT-PRIME, and REVASCAT, all published in NEJM in 2015, the authors of THRACE reported positive findings, once again validating the benefits of endovascular therapy in acute ischemic stroke. But unlike these prior trials, their results were far from the spectacular findings which splattered the pages over the prior year.

Bracard et al randomized 418 patients presenting with 4-hours of symptom onset with a large vessel occlusion amenable to endovascular interventions (confirmed by either CT or MR angiography), to either IV tPA alone or IV tPA with the addition of mechanical thrombectomy. 17% of the patients randomized to receive mechanical thrombectomy never underwent the procedure because of rapid neurological improvement. Despite its early stoppage, the authors found a statistically significant improvement in the number of patients alive and independent at 90-days in those randomized to receive mechanical thrombectomy. The number of patients with an MRS of 0-2 who received IV tPA alone was 42% compared to 53% of the patients who received IV tPA with the addition of mechanical thrombectomy. This 11% difference reached statistical significance reporting a p-value of 0.028. They also reported no difference in the rate of symptomatic intracranial hemorrhage between the two groups (2% in each group) (2).

Interestingly, when the results were examined using an ordinal analysis, the difference between the two cohorts was no longer statistically significant (2), the same ordinal analysis made famous for pulling IST-3 from the depths of statistical triviality. This comment is not intended to insinuate that the null hypothesis is true, clearly there is an obvious difference between the two groups examined in the THRACE cohort. Rather to highlight that without the ordinal analysis, the results of trials such as IST-3 and INTERACT-2 would have been dismissed as negative. In these cases, the authors have emphasized the importance of such statistical manipulations. But when its incongruence with the traditional dichotomous measurement fails to support a statistically positive trial, its findings are briefly remarked upon in a single sentence, “Considering the modified Rankin score at 3 months as an ordinal variable in a regression model, we did not find any difference between the IVTMT and IVT groups.”

Like the trials before it, THRACE was stopped early due to the findings of MR CLEAN. But unlike ESCAPE, EXTEND-IA, SWIFT-PRIME, and REVASCAT, THRACE neared completion before its untimely demise. The authors originally planned on enrolling 480 patients, and thus the 418 they enrolled was only 62 less than their intended sample size. Leaving THRACE as the largest cohort of patients enrolled in an endovascular trial since the publication of MR CLEAN (n=500). And while comparisons across trials are not always appropriate, it is interesting that the two largest cohorts examining endovascular therapy for acute CVA report much more modest benefits than the spectacular results described by the trials stopped after enrolling just a small number of intended subjects (1,2). Many (including myself) have offered differences in imaging criteria and individual trial’s time constraints as the reason for this variability. But given the degree of statistical instability in this data set such a hypothesis is far from conclusive. In fact, random deviations around the true effect size may be equally responsible for the differences observed between these trials.

As discussed in a previous post, there is a risk to the premature stoppage of trials for positive results (3). Early in a trial’s enrollment it is not uncommon to observe wild variations in the effect size, as the small sample size heightens the influential powers of random error. As the cohort approaches its full sample size these variations will diminish, regressing towards the true effect size. Imagine the path of a pendulum (Fig. 1), spanning a substantial arch shortly after being set in motion (T₂). But with time and a loss of kinetic energy, its swing lessens (T₁₎, eventually deviating minimally from its position of central tendency (T₀). As such, during any interim analysis it is unclear where in the pendulum’s arch we have measured our sample (y_0, y₁, or x). It is difficult to distinguish whether the observed effect size is due to a true difference or the random variation of sampling error. Of course you are just as likely to perform an interim analysis that underestimates an intervention’s true effect size, but it is far less likely such results will stimulate a trial’s premature termination. Nor are such findings likely to be published in prominent journals. In short, trials stopped early for benefit will on average overestimate the effect size of the therapy in question (4). This risk increases in trials stopped after enrolling very small sample sizes (5). And these statistical embellishments are only amplified as the trials stopped early for benefit are more likely to be published in high impact journals. The consortium of trials terminated prematurely because of MR CLEAN are obviously at risk of both these forms of amplification. The first few trials, citing spectacular results, published in the NEJM with impressive acclaim and fanfare. And with time, as if the novelty of the therapy lost its luster, so did the celebrity of the trials. Each one published with waning excitement, fading interest, and diminishing results, until the final two trials, THERAPY and THRACE, published recently, incited nothing more than a whisper.

Leading us to the sixth and final trial stopped prematurely due to MR CLEAN. THERAPY was published quietly earlier this year in Stroke (6). Its hushed reveal, not due to an inferiority of its methodological rigor, but rather because it suffered the unenviable fate of underperforming due to random variations in statistical chance. Like their colleagues, following the publication of MR CLEAN, the authors of the THERAPY trial halted enrollment after recruiting only 108 of the 692 patients they originally intended. But unlike the other trials, their results were neither statistically nor clinically impressive. Mocco et al found no significant benefit between those who received IV tPA with the addition of aspiration thrombectomy vs IV tPA alone. Although the patients in the endovascular arm performed better than those randomized to receive IV tPA alone (mRS 0-2 of 38% vs 30%), this difference failed to reach statistical significance (p=0.52). A novel aspiration device (endovascular penumbra system) or a slightly different patient population sampled (large vessel occlusions with clots >8mm in length) may have been responsible for their underperformance. But it is equally likely their feeble accomplishments were due to the early stoppage and an undersized sample population leading to a large variation in random error.

I think even the most skeptical of us have accepted that in the appropriate patient endovascular therapy is beneficial. The exact patient population that will benefit from this intervention and the magnitude of this benefit remains unclear. There will always be random variation distributed around the true effect size of any therapy examined. The degree of variation present depends largely on the size of the trial and the intrinsic biases in their design. Generally, the small sample sizes of the individual studies in this data set led to wide variations in the observed results. The unanimous early termination of each of these trials only adds to the volatility of these findings. I suspect the true effect size is closer to the results described in MR CLEAN and THRACE rather than the wildly optimistic findings of EXTEND-IA, ESCAPE, and SWIFT-PRIME. But I fear as with thrombolytics, we will never know the true efficacy of endovascular therapy. The results of EXTEND-IA and ESCAPE will be used in vain to cite wildly inappropriate NNTs, while the more modest findings of MR CLEAN and THRACE will be used only for the statistical legitimacy they lend to the smaller sample sizes of their more boastful comrades.

Sources Cited:

Berkhemer OA, Fransen PS, Beumer D, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med. 2015;372(1):11-20.
THRACE investigators, Bracard S, Ducrocq X, et al. Mechanical thrombectomy after intravenous alteplase versus alteplase alone after stroke (THRACE): a randomised controlled trial. Lancet Neurol 2016; published online Aug 22
Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012;344:e3863.
Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials 1989;10(suppl 4):209-21S
Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, Briel M, et al. Randomized trials stopped early for benefit: a systematic review. JAMA 2005;294:2203-9.
Mocco J, Zaidat OO, Von kummer R, et al. Aspiration Thrombectomy After Intravenous Alteplase Versus Intravenous Alteplase Alone. Stroke. 2016;47(9):2331-8.