Since the earliest trials examining the efficacy of tPA for acute ischemic stroke there has been a tendency to play it fast and loose with the scientific method. The results of the landmark NINDS-2 trial (1), a moderate sized RCT, with a tenuously positive primary outcome (Fragility Index of 3), were never validated. The results of ECASS-3 (2), the second positive RCT examining tPA for ischemic stroke, led to the extension of the treatment window from 3 to 4.5 hours despite being in conflict with a decade of prior literature. IST-3 (3), a negative trial, is the largest trial to date, and has been broadly cited as a success based on a secondary measure using an ordinal regression of dubious methodological rigor. But with the publication of the EXTEND trial (4) by Ma et al in the NEJM, we have moved from a casual handling of the scientific method to a brazen disregard.
Ma et al enrolled adult patients presenting with symptoms concerning for a CVA, within 4.5-9 hours of symptom onset, with NIHSS between 4-26, who had salvageable brain tissue detected on perfusion imaging. Patients were randomized to either IV tPA or placebo. Patients who experienced stroke like symptoms upon awakening from sleep were also eligible for enrollment if they demonstrated appropriate perfusion imaging.
The trial, initially planned to enroll 400 patients, was stopped prematurely due to “loss of equipoise” after the publication of the WAKE-UP trial (5) found a fairly fragile signal of benefit in favor of IV tPA.
From August 2010 through June 2018 Ma et al enrolled 225 patients at 28 centers across Australia, Asia, and Europe. The majority of these patients (65%) presented with symptoms upon awakening from sleep, 25% presenting between 6-9 hours after symptom onset and 10% presenting within 4.5-6 hours. The trial found no difference in the rate of their primary outcome, number of patients with a modified Rankin scale (mRS) score of 0 or 1 at 90-days, occurring in 35.4% of patients in the tPA group and 29.5% in the placebo group (RR1.2, 95% CI 0.82–1.76, P=0.35). Nor did they observe a difference in the ordinal analysis or 90-day mortality. Of note the rate of symptomatic intracranial hemorrhage within 36-hours, while not statistically significant, is consistent with every other trial examining IV tPA for CVA, occurring in 6.2% of the tPA group and 0.9% of the control group.
By all accounts this is an impressively negative trial and yet somehow the authors conclude,
Among the patients in this trial who had ischemic stroke and salvageable brain tissue, the use of alteplase between 4.5 and 9.0 hours after stroke onset or at the time the patient awoke with stroke symptoms resulted in a higher percentage of patients with no or minor neurologic deficits than the use of placebo.
To support this claim, the authors cited the adjusted analysis performed on their primary outcome, reporting an adjusted risk ratio of 1.44 (95% CI 1.01 to 2.06; P=0.04). With what statistical wizardry were they able to transform an entirely unimpressive p-value of 0.35 to 0.04? In this case the methodological sleight of hand came in the form of a Poisson regression analysis. Poisson regression analyses like, all regression analyses are a means of controlling for covariants that may influence the outcome in question. It is not clear why the authors chose to use this specific form of regression analysis over the more traditional logistic regression model. It is interesting to note that in their original statistical analysis plan the authors did not plan to perform a Poisson analysis, stating they would perform a binary logistic regression. It was not until just prior to the publication of their final manuscript that the Poisson analysis was proposed as the preferred method of analysis. Tucked away in the supplementary appendix one can find their original binary logistic regression analysis, which like the unadjusted results found no difference between the groups.
More important than quibbling over the preferred regression analysis is to question whether a regression analysis should have been performed in the first place. Regression analyses are statistical methods which attempt to control for any confounding variables that might be present. They are a useful tool in observational cohorts which typically have a great deal of non-random error. In an RCT any differences in baseline variables can be attributed to random chance, and ideally limited by increasing the sample size. Or in this case not stopping the study prior to obtaining the preplanned sample. No amount of regression analyses will change the fact that this was a small trial, stopped prematurely, with a minimal difference observed between the tPA and control groups. At very best the results of their Poisson analysis should be viewed as hypothesis generating.
The problem with this study is not the regression analysis used by the authors, but rather their underlying intentions. Were they attempting to unearth an otherwise obscured truth or simply manufacture a positive trial? Ma et al set out to examine the utility of IV tPA in patients within 4.5-9 hours of symptom onset who had salvageable brain tissue demonstrated on perfusion imaging. This concept has been given some clinical legitimacy with the publication of the endovascular trials demonstrating that certain patients far outside the traditional time windows benefited from reperfusion therapy (6,7). And while this is a valid hypothesis, in this case it was made in bad faith. If the authors believe that time is a poor surrogate for salvageable tissue, then why only examine patients outside the 4.5-hour window? If time is a poor surrogate, then patients presenting in under 4.5-hours without salvageable tissue demonstrated on perfusion imaging should not benefit from reperfusion therapy. Why, maintain empiric time thresholds for the first 4.5-hours and only then move towards individualized selection of patients with appropriate perfusion studies? The answer is simple, because this study, like all the studies before it, is not about identifying the small select subset of patients who may benefit from IV tPA, but rather is designed to expand the number of patients who receive this medication. To do so the authors have created a positive trial from the statistical remains of a negative study, a statistical parlor trick, intended to EXTEND the use of IV tPA beyond what is supported by science.
Sources Cited:
- Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995;333(24):1581-7.
- Hacke W, Kaste M, Bluhmki E, et al. Thrombolysis with alteplase 3 to 4.5 hours after acute ischemic stroke. N Engl J Med. 2008;359(13):1317-29.
- Sandercock P, Wardlaw JM, Lindley RI, et al. The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial. Lancet. 2012;379(9834):2352-63.
- Ma H, Campbell BCV, Parsons MW, et al. Thrombolysis guided by perfusion imaging up to 9 hours after onset of stroke. N Engl J Med 2019;380:1795-1803.
- Thomalla G, Simonsen CZ, Boutitie F, et al. MRI-Guided Thrombolysis for Stroke with Unknown Time of Onset. N Engl J Med. 2018;379(7):611-622.
- Nogueira RG, Jadhav AP, Haussen DC, et al. Thrombectomy 6 to 24 Hours after Stroke with a Mismatch between Deficit and Infarct. N Engl J Med. 2018;378(1):11-21.
- Albers GW, Marks MP, Kemp S, et al. Thrombectomy for Stroke at 6 to 16 Hours with Selection by Perfusion Imaging. N Engl J Med. 2018;378(8):708-718.
- EM Nerd-The Case of the Partial Cohort - May 24, 2020
- EM Nerd: The Case of the Sour Remedy Continues - January 20, 2020
- EM Nerd-The Case of the Adjacent Contradictions - December 23, 2019
Always enjoy reading your work on this site. Thanks for the write up.
Of all the dodgy research practices you have documented over the years this has to be the most outrageous.
There is no point in going to all the effort of performing an RCT (randomisation, allocation concealment, blinding etc) if you are going to do a regression analysis in an attempt to adjust for confounders. The foolishness of this has been well-described by Stephen Senn. It makes no sense. It is p-hacking, pure and simple.
I don’t see this as “p-hacking”, but it is a confusing choice of analysis method in this case. Poisson regression is OK to use, (Log-linear regression is fine), and it may be correct actually, but I’d have to explore the data set to know for sure. I would say this: using log-linear regression isn’t wrong on it’s face, and it’s a leap to say it is. The authors should have identified in their manuscript why they used it after they used binary logistic regression though. It may have come out of the peer review process, it may have been asked… Read more »
If the predefined analysis is negative and the actual final analysis is positive, it is p-hacking until proved otherwise.
The authors did, in fact, write an entirely separate manuscript about their statistical analysis plan, which was finalized before database lock and without access to trial outcome data. The authors actually explained the decision to change the analysis plan: https://www.ncbi.nlm.nih.gov/pubmed/30523735 ” During the publication peer-review process for EXTEND IA TNK RCT in early 2018, New England Journal of Medicine Editorial Office expressed clear preference for reporting effect sizes for dichotomous outcomes as risk ratios (RRs), as these are more appropriate for the prospective nature of the RCT. We therefore pre-specify adjusted RRs as the primary analysis for this RCT but… Read more »
I’m sorry, but this comment is simply wrong (and it’s especially curious that you would cite Stephen Senn on this issue, as he has written many times to advocate FOR multivariable regression analysis in randomized controlled trials). For example, in Senn’s well-known “Seven Myths” article (link below), the passage on “Myth 6” is entirely advocating FOR the adjustment of baseline covariates. https://onlinelibrary.wiley.com/doi/full/10.1002/sim.5713 It is well-known in the statistical community that a regression analysis with adjustment for baseline covariates known to have strong association with outcome has several advantages. But don’t just take my word for it: Frank Harrell’s book –… Read more »
Rory
Take an hour, consolidate the post, and get this into NEJM letters to the editor. I would love to see how the authors respond. Given what you have cited above, you have a very high probability of getting the letter accepted.
Brad
Agreed! This is a spot-on critique, and it deserves to be acknowledged by a wider audience.
Except, it *isn’t* a spot-on critique. It’s dead wrong about one point and misleading about another. It’s terrific to see people that care about the literature and critique, but please, verify that you’re actually correct about this stuff before posting. Repeating the same half-truths and half-understandings that are passed down from your mentors might be great for scoring points with your friends, but this is honestly harmful to research and patient care because it makes people skeptical of even good trials that are doing things correctly, just because you don’t have a full understanding of the concepts.
Hi Rory.
I’m from Brazil and I’m an intensivist. I really believe that this trial will cause a lot of harm for the patients here. I don’t see any Kind of auto criticism among our fellows neurologists about tPA in stroke so far. They seems to believe that they have the silver bullet and now they will shoot in all directions in the dark.
Adjustment for basline imbalance in covariates is meant to improve the precision of the estimate of the treatment effect but not introduce any bias. In this trial however, for the primary outcome the adjusted analysis has shifted the point estimate and reduced precision (that is width of the confidence interval). How is that explained?
See the following Twitter thread for a critical take on this post from statisticians who do clinical trials. Highly informative take:
https://twitter.com/adalthousephd/status/1127226898050842624?s=21
I have no dog in the “tPA” fight, but as a trial statistician, there is one thing that this post clearly got wrong, and another that is represented poorly. 1) it is perfectly acceptable and in fact recommended statistical practice to perform regression analysis in a randomized controlled trial with adjustment for a limited number of baseline variables known to have strong associations with outcome. One of my comments below lists a large number of references supporting this, and I have links to a few excellent Twitter threads demonstrating why this is so. It’s not commonly done, but that’s more… Read more »
And, candidly, it is maddening to see a fellow commenter on this thread who is clearly unaware of this, citing Stephen Senn’s writings about this “foolishness’ of doing a regression analysis in an RCT, when Senn is, in fact, an advocate for the use of covariable adjustment, a verifiable fact if you read any of his recent articles.
Thanks for your comments Andrew. This is a very interesting topic. As I said on twitter when we were discussing this, I never claimed adjusted analysis should never be used in RCTs, rather they should be used in specific situations. In fact, this is exactly what is recommended by the CONSORT guidelines (http://www.consort-statement.org/Media/Default/Downloads/CONSORT%202010%20Explanation%20and%20Elaboration%20Document-BMJ.pdf). The studies you cite are interesting but in the end they are simulations. As we discussed on twitter you have failed to highlight one medical truth which was revealed simply because of an adjusted analysis. Adjusted analysis or not a study like EXTEND is still at high… Read more »
Rory, let me preface my reply with this: it’s great that you want to critically evaluate the literature, and presumably that comes from a place of wanting the best possible care for patients. I want the same thing – despite comments about trickery and magic, statisticians are probably the most unconflicted actors in the entire research enterprise. We just want to get the right estimates of the things we’re trying to estimate. Let me also state this so any “conflicts” are on the table: I am a trial statistician at the University of Pittsburgh, principally working on trials in palliative… Read more »
Andrew, You seem to continue to misinterpret my blog post. You are arguing over the statistical minutiae. Maybe that is because as you said you plead “a little bit of ignorance about the specific disease area and measure” Let me make this fairly simple. You feel “the use of a regression analysis in a trial is appropriate, and the decision-making process that led to the final SAP appears to be reasonable and justified” My take is it doesn’t really matter what adjusted analysis you use. No amount of statistical adjustment can make up for the fact that this is a… Read more »
@ Rory: In your post, you asked two important questions: 1) With what statistical wizardry were they able to transform an entirely unimpressive p-value of 0.35 to 0.04? Authors were able to transform negative to positive resutls by adding Age and NIHSS in their multivariate model, two major prognostic factors that predict mRS . The adjusted analysis led to a relative risk of 1.44 while this figure for unadjusted was 1.20. This represents 0.22/1.2=18% increase of effect size in adjusted versus unadjusted, i.e. very large for a RCT suggesting confounding bias. Of note, groups were not balanced well at baseline… Read more »
I’ve stumbled across this several months after publication, and found both Rory’s thoughts and Andrew’s statistical counter-points in the comments to be interesting reads. One point not discussed is the final paragraph of this blog, which implies that the failure to include patients presenting within 4.5 hours of onset is a cynical move. On the contrary, EXTEND only included patients with salvageable penumbra on CT-P and randomised to either tPA or placebo. Patients within 4.5 hours of onset are known to benefit from tPA (regardless of your thoughts on the original trials, this is considered best clinical practice) and so… Read more »