The Adventure of the Cardboard Box

There have been a number of seemingly negative studies published recently in which the authors, using ordinal analysis, have claimed their trials are in fact positive. Though not the first, the most notable of these studies was the IST3 trial published in May of last year. In this trial, the largest to date, comparing thrombolytics to placebo in acute ischemic stroke, they found no significant difference in their primary endpoint, “alive and independent” at six months. In fact, at six months mortality was identical (27%) in the two groups. The only truly significant finding was that t-PA killed significantly more people in the first seven days (an absolute mortality difference of 4%). In a moment of statistical wizardry the authors of this trial claimed that on the basis of their secondary endpoint, the trial was in fact a positive study. This secondary end point was defined as the “shift in outcomes” on the Oxford Handicap Scale (OHS). For this outcome measure they used a compressed OHS scale where 0,1,2, and 3 remained independent levels and 4,5,and 6 (for some things, they said, were worse than death) were compressed into a single level. Using ordinal logistic regression and further statistical adjustment for “imbalances” between the groups they found an odds ratio of 1.26 in favor of t-PA.

Many far more intelligent physicians have written and discussed the multitudes of flaws in this paper and its general interpretation (2,3,4). Recently though there have been two major articles published following in IST3’s statistical footsteps that have chosen to use ordinal analysis to claim significance in otherwise negative datasets.

The first of these two trials is actually the 18 month follow up of IST3 (5) in which the authors claim that the virtually identical statistically insignificant difference in treatment versus placebo is further proof of t-PA’s efficacy based off of their mysterious ordinal analysis. Once again it is only after the authors make a second statistical adjustment in for baseline imbalances in seemingly identical groups (6) that their findings become “significant” with an odds ratio of 1.28.

The second of the two trials is the INTERACT 2 trial (7) published in the New England Journal of Medicine in June of 2013. In this trial, the authors sought to show that aggressive management of blood pressure in patients with acute intracerebral hemorrhages was beneficial. The author’s primary endpoint was death or major disability (defined as a score of a 3,4,5, or 6 on the Modified Rankin Scale (MRS) at 90 days). Their “key” secondary endpoint at the onset of the trial was death or severe disability in patients treated within four hours of onset of ICH (an endpoint that was found to be statistically insignificant). After the publication of IST3, ordinal analysis gained “acceptance” as a way assessing neurological outcomes. The authors quickly redefined their key secondary outcome as “physical function across all seven levels of the modified Rankin scale as determined with the use of ordinal analysis.”

Ordinal analysis as a tool to evaluate neurological outcomes is a form of a black box. Data is inserted and answers are revealed on the other side, without awareness of how those answers are derived. The situation is further clouded when the data that emerges is in a form we cannot interpret. Thus we are placed in a position to either trust the authors’ claims that their intervention is beneficial, or to discard it entirely because it cannot be integrated into our knowledge set. It is for these reasons that gaining a basic understanding of ordinal analysis is important. Primarily, is ordinal analysis an appropriate tool for the assessment of neurological outcomes and if so how do we interpret its results? The following is not an attempt to derive the formulas needed to perform ordinal analysis but merely to explain the strengths and weaknesses of doing such statistical machinations.

A paper published in STROKE in 2007 (8) entitled “Novel End Point Analytic Techniques and Interpreting Shifts Across the Entire Range of Outcome Scales in Acute Stroke Trials”, seems like an appropriate place to begin to answer these questions. In this paper the author, Dr. Saver, proposes that the traditional dichotomized method we currently use to evaluate neurological outcomes drastically underestimates the benefit and/or harm of any specific intervention. In particular we are only able to evaluate the benefit of the intervention over one seemingly arbitrary change in function. For example, in most trials evaluating thrombolytics in stroke we use outcomes of 0 and 1 or 0, 1, and 2 on the MRS as a good outcome and 3, 4, 5 and 6 as bad outcomes. This definition limits what effects we can measure. If the intervention shifts a large quantity of patients from death or near death (a 5 or 6) to moderately disabled (a 3) then, using a dichotomous cut off we would not be able to detect this change. Likewise if the intervention shifted patients’ outcomes from slightly disabled (a 2) to no symptoms at all (a 0) this important changes would go unobserved. Ordinal analysis attempts to overcome this obvious flaw by analyzing the individual changes in each patient across the entire spectrum of the scale in question.

The ability of ordinal analysis to assess the efficacy of an intervention in a more global fashion proves to have a secondary benefit. Specifically this mathematical manipulation increases the statistical power of your cohort (10), which the authors of all three of these trials used to their advantage. This increase in power can also be seen as a weakness if it is not taken into account when calculating the power of your study (9). In these cases, clinically insignificant differences may seem statistically significant and type 1 error can occur. The second more important limitation involves the multitude of assumptions that have to be made to use ordinal analysis in parallel group studies (11).

The primary concept of ordinal analysis is examining how the groups in question change from their baseline status. This is most effectively observed in crossover studies where the effects of both the placebo and active interventions can be observed in a single cohort (11). Obviously for trials like INTERACT2 this cannot be done. In the case of parallel cohorts these baseline measures can be estimated in a variety of ways. First, you can measure the baseline levels of the two groups (placebo and treatment) and measure how many patients were helped or harmed from the intervention or placebo. Unfortunately in both IST3 or INTERACT2, the investigators did not measure a baseline mRS or OHS scale. A second option is to assume that the groups are equal at baseline and since theoretically the placebo should have no effect, you can assume that the intervention group’s baseline is similar to the placebo group. In this case you would then compare the placebo and the treatment group to each other. The last method proposed, is to have a group of experts looking at the entirety of the data and each estimate the degree of shift the treatment arm caused when compared to the placebo arm (12). All these options seem to add an increasing degree of bias to the analysis of a dataset. Whichever technique an author chooses to use, it is important they are transparent when presenting their dataset to allow the readers take into account these various threats on the trial’s validity.

Despite these flaws lets say for the moment that we accept ordinal analysis as an adequate technique for examining neurological outcomes. How then, do we go about interpreting the finding from these trials? In the original IST3 trial (1) the authors tell us that when using ordinal analysis they were able to find a statistical significant difference with an odds ratio of 1.26 in favor of t-PA. What does this mean? With an odds ratio alone we are not able to calculate the actual clinical impact of this statistical difference. Using the dichotomized endpoint, the NNT or NNH are inherently easy to calculate and thus one is capable of determining an intervention’s clinical efficacy. Using an ordinal analysis and presenting only odds ratios leaves the readers without an understanding of the true significance of these findings. It is possible to calculate NNTs and NNHs when using ordinal analysis but again assumptions have to be made and the calculations are far more complicated than those derived from dichotomous endpoints (11).

Let us return to INERACT2. The trial’s primary endpoint though negative, is clearly trending towards favoring the treatment group. The author’s attempt to strengthen this claim by presenting the results of their ordinal analysis, which provides a statistical significant odds ratio of 0.87 in favor of the intense therapy group. Again following IST3s lead they fail to provide us with the method they used to calculate this odds ratio or any attempts to quantify its magnitude. Thus they leave us in a state of clinical purgatory, unclear of how to interpret their findings. They themselves state in their study’s appendix that an absolute difference of less than 7% would not be clinically significant and yet use the increased power that an ordinal analysis provides, to claim a benefit that though impossible to truly quantify is clearly lower then the 7% difference they defined as the lower limit of clinical significance.

John Ziman in Reliable Knowledge said, “Scientific knowledge is distinguished from other intellectual artifacts of human society by the fact that its contents are consensible…Each message should not be so obscure or ambiguous that the recipient is unable either to give whole-hearted assent or to offer well-founded objection.”

In all three of these trials we are not able to give our wholehearted assent or to offer a well-founded objection. In regards to the two papers published on IST3 (1,2), the obvious early harms make whatever the small benefits inconsequential. In the INTERACT2 trial this is not the case. Since there were no obvious harms in aggressively lowering the blood pressure, not quantifying the benefits claimed leaves the readers in a state of uncertainty. The authors’ lack of transparency regarding their manipulation of data, renders the findings unusable in the clinical arena.

Sources Cited:

The IST-3 collaborative group. The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial. Lancet 2012; 379
Hoffman, J. R. and Cooper, R. J. (2012), How is more negative evidence being used to support claims of benefit: The curious case of the third international stroke trial (IST-3). Emergency Medicine Australasia, 24: 473–476
Newman, D. Delusions of Benefit in the International Stroke Trial.
Ryan P. Radecki, MD, Yashwant G. Chathampally, MD, MS, Gregory M. Press, MD, RDMS. rt-PA and Stroke: Does IST-3 Make It All Clear or Muddy the Waters? Annals of Emergency Medicine Volume 60, Issue 5 , Pages 666-667, November 2012
Effect of thrombolysis with alteplase within 6 h of acute ischaemic stroke on long-term outcomes (the third International Stroke Trial [IST-3]): 18-month follow-up of a randomised controlled trial The IST-3 collaborative group The Lancet Neurology – 21 June 2013
Sandercock P, Lindley R, Wardlaw J, et al. Update on the third international stroke trial (IST-3) of thrombolysis for acute ischaemic stroke and baseline features of the 3035 patients recruited. Trials 2011; 12: 252.
Anderson CS et al. Rapid blood-pressure lowering in patients with acute intracerebral hemorrhage. N Engl J Med 2013 June 20
Jeffrey L. Saver, MD. Novel End Point Analytic Techniques and Interpreting Shifts Across the Entire Range of Outcome Scales in Acute Stroke Trials. Stroke. 2007; 38: 3055-3062
Barbara C. Tilley, PhD. Contemporary Outcome Measures in Acute Stroke Research. Choice of Primary Outcome Measure and Statistical Analysis of the Primary Outcome in Acute Stroke Trials. Stroke. 2012; 43: 935-937
Bath Et al. Analysis of the Primary Outcome in Acute Stroke Trials. Stroke. 2012;43:1171-1178Statistical
Gordon et al. Interpreting treatment effects in randomised trials. BMJ 1998;316:690–3
Saver JL.Number needed to treat estimates incorporating effects over the entire range of clinical outcomes: novel derivation method and application to thrombolytic therapy for acute stroke. Arch Neurol. 2004 Jul;61(7):1066-70
The National Institute of Neurological Disorders and Stroke (NINDS) rt-PA Stroke Study Group Tissue plasminogen activator for acute ischaemic stroke. N Engl J Med. 1995;333:1581–1587
Hacke W, Kaste M, Fieschi C. Intravenous thrombolysis with recombinant tissue plasminogen activator for acute hemispheric stroke. The European Cooperative Acute Stroke Study (ECASS) JAMA. 1995;274:1017–1025.
Hacke W, Kaste M, Fieschi C. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II) Lancet. 1998;352:1245–1251
Hacke W, Kaste M, Bluhmki E. Thrombolysis with alteplase 3 to 4·5 hours after acute ischemic stroke. N Engl J Med. 2008;359:1317–1329
Clark WM, Wissman S, Albers GW, Jhamandas JH, Madden KP, Hamilton S. Recombinant tissue-type plasminogen activator (Alteplase) for ischemic stroke 3 to 5 hours after symptom onset. The ATLANTIS Study: a randomized controlled trial. Alteplase Thrombolysis for Acute Non Interventional Therapy in Ischemic Stroke. JAMA. 1999;282:2019–2026.