The Adventure of the Red Circle

When it comes to non-traumatic intracranial hemorrhage (ICH) the onus of the emergency physician is diagnosis, while location and severity are of far less importance. Once the diagnosis is made and the initial stabilization complete, there is very little for us to do other then notify the ICU team and contact the neurosurgeon, who, in varying degrees of pleasantry (depending on the time of day or night you call him or her), requests us to elevate the head of the bed, control the systolic blood pressure and correct whatever coagulopathies may be present, all of which we have most likely already taken care of. As of now there has been no evidence that urgent surgical intervention is beneficial. In fact, the STITCH trial published in the Lancet in 2005 conclusively showed no benefit to early surgical intervention (4). Fear not. All is not lost. Among the multitude of subgroup analyses they performed, the authors found that patients with superficial cortical ICHs with no ventricular involvement, had non-statistical improvement with early surgery over conservative management. Instead of claiming truth from a p-value, which was most likely statistical chance, the authors chose to use subgroup analysis in the fashion it was intended, hypothesis building. These are ashes from which STITCH II was born.

STITCH II was comprised of a highly selected subset of ICH patients with lobar bleeds, close to the surface of the cortex (<1 cm), with no ventricular involvement (3). If early surgery had proven to be beneficial in these patients, STITCH II would have forced us to step up our game. We would be required to know the location, size, and depth of the bleed and whether or not there was ventricular involvement. If STITCH II was positive, these features would be the radiologic equivalence of a STEMI.

STITCH II was negative…

The study found no statistical improvement in patients treated with early surgical intervention when compared to conservative management. It may not be so clear cut. The original STITCH trial was equivocally negative. No trends in mortality or better neurological outcome and other than a single group found while dredging their data, not a single p-value that came close to crossing significance. In STITCH II however this was not the case. There were clear trends to improvement in both mortality and functional neurological outcomes. A 6% absolute risk reduction in mortality with a p-value of 0.095 was the closest this trial got to significance, but one cannot help but notice that the treatment arm did better.

When assessing an intervention’s efficacy it is important to take each trial in the context of the totality of the literature. Failing to take this perspective makes it difficult to distinguish the signal from the noise. It took over 17,000 people in ISIS 2 (1) to show a small but real benefit for the use of aspirin in acute MI. IN GUSTO IIB(2) it was not until the outcomes of death, reinfarct and stroke were combined into a single composite measure that the superiority of PCI over t-PA became statistically significant. If ISIS 2 was a smaller cohort or GUSTO IIB’s results were not duplicated on multiple occasions we might not be in the “was aspirin given, door to balloon time” obsessed world which we live in today. As a self-proclaimed nihilist it is important to keep an open mind and remember the meager upbringings of the likes of PCI and how far it has come.

It is with this mindset that I turn my focus towards STITCH II (3). Published August 2, 2013 in the Lancet, STITCH II addresses the importance of early surgery in patients with cortical intracranial hemorrhage. Let me start by complimenting the authors on an excellent study, and for the honesty and balance they showed when approaching their results. STITCH II was a natural and appropriate reaction to the negative results found in STITCH (4). STITCH II, like its predecessor compares the importance of early surgery to conservative management in patients with ICH. The major difference in the cohorts is the inclusion criteria, with STITCH II’s inclusion criteria tailored to answer the hypothesis derived from STITCH.

STITCH II was a pragmatic, randomized controlled trial that allowed the surgeons to decide what surgical approach would be used in the early intervention group. That being said, 99% of the interventions were craniotomies so practically we are comparing earlier craniotomy to conservative management. The primary outcome by which, the authors judged their intervention to be efficacious or not, was called the “Prognosis-Based Dichotomized Extended Glasgow Coma Scale (GOSE)” .

Many authors have argued that the traditional dichotomous endpoint used to assess neurological outcomes is too crude a tool to account for the more subtle changes that may occur with a given intervention(for those interested in such discussions see earlier post on ordinal analysis). In disease states such as ICH, where a large quantity of the cohort is left severely disabled from the event, minor changes in this group’s function will go undetected when a single dichotomous cutoff is used to measure outcomes. To solve this some authors have suggested the use of ordinal analysis or in this case a prognostic based approach(5).

The prognostic-based model used by these authors, is described in an article by Murray et al published in the Journal of Neurotrauma. They describe using a cohort from 2 previous studies on head trauma (7) and using logistic regression retrospectively deriving a prognostic rule based on GCS, age and hematoma size. They then divided the group into good, intermediate and poor prognosis based on this rule. The authors go on to proclaim their prognostic tool a success, as it has allowed them to detect smaller incremental benefits between the intervention and control group. Whether this is a true benefit that would have otherwise gone unnoticed or a type 1 error created by amplifying the noise, it is hard to say.

The authors of STITCH II apply a similar tool when prognosticating their patients into good or poor outcomes. The following is the exact formula they use:

(10 x GCS) – age – (0.64 x Volume) = Prognostic Group

The predefined cutoff for good vs poor prognosis was 27.672. This very specific (down to 3 decimal points) cutoff was arrived at by retrospectively fitting the patients from the original STITCH trial who meet STITCH II’s inclusion criteria into this formula. 27.672 was the median of this group (9). This prognostic tool was seemingly derived (at least in concept) from a cohort comprised of patients with a entirely different disease state and then retrospectively derived and internally validated using a small subgroup of 157 patients from the original STITCH trial. When constructing a prognostic tool such as this, one would ideally plot all the subjects on a tri-planar graph with the X,Y, and Z axis representing GCS, age and hematoma size. The goal of the prognostic tool is to find the slope of the line that most affectively separates those patients with good functional outcomes and low mortality from those with poor functional outcomes and high mortality. Deriving the slope of this line from a population of 157 subjects will invariably lead to a great deal of over-fitting and will fail to perform as well when applied to any other cohort (8). Since the goal of this prognostic tool was not to predict outcomes in the general population, but to effectively distinguish patients with good and poor prognosis in this specific cohort, it is curious the authors chose to use the predefined cutoff of 27.672 obtained from a previous cohort. Surely a more effective tool would have arisen if they had derived the slope that best fit their population. That being said 35% of the cohort was classified in the poor prognosis subgroup using this predefined definition. At 6 months, 39% of the poor prognosis group died and 33% and severe disability. In contrast, in the good prognosis group only 12% of the patients died and 17% had severe disability. Obviously the prognostic tool they used has some ability to separate good from poor outcomes and how could it not given the components it is built from? Whether or not this was the most efficient version or whether the flaws in its derivation affected the outcomes in the paper, we cannot readily assess.

Using this prognosis-based GOSE dichotomous outcome, the authors found 41% of the surgical group vs 38% of the initial conservative group had a favorable outcome at 6 months. This accounts for a statistically insignificant absolute difference of 3%. For those more comfortable using the modified Rankin Score, the prognosis-based model found 47% of the early surgical patients with a favorable outcome at 6 months compared to 43% in the initial conservative treatment group. I have included the calculations for the GOSE and mRS when using a pure dichotomous cutoff without the prognostic-based correction factored in. Favorable outcome on the GOSE scale is defined as moderate disability or good recovery and as a 0,1, or 2 on the mRS.

As you can see the prognostic-based model though increasing the amount of patients defined as having a favorable outcome, did not change the absolute difference between the two groups. No subtle benefits of early surgery were uncovered when applying this more sophisticated approach. This is an interesting, but in this case unnecessary statistical manipulation.

STITCH II found a 6% absolute mortality benefit to those undergoing early surgery. This difference threatened significance with a p-value of 0.095. The study was powered to detect a 12% absolute difference in their primary endpoint. The difference the authors deemed “clinically” significant, but a 6% difference in mortality is not something to discarded as inconsequential. In the final analysis, the conservative treatment group had 3% more patients in the poor prognosis group when compared to the early surgery group (36%vs33%). Did this affect their outcome? As per the authors own subgroup analysis, the benefits of early surgery was seen exclusively in the poor prognosis group with trends to harm in those with a good prognosis. Is there an even smaller subgroup that early surgery may be beneficial for? Are we destined for a STITCH 3?

Presently we are left in with more uncertainty then before STITCH II was published. There may in fact be a group of patients who benefit from early invasive surgery after spontaneous ICH. Patients who are too sick, those with basal ganglia or thalamic bleeds, with ventricular involvement, or with an initial GCS less the 8, do poorly as demonstrated in the original STITCH trial(4). Patients who are too healthy, those with an overall promising prognosis, also show no benefit and possible harm to early intervention (3). Thus we are thrown into a Goldilocks-like dilemma in trying to identify the patient population for which early surgery may prove beneficial. It behooves us to continue to examine the question in a scientific manner rather than to taste each bowl of porridge and subjectively declare “too hot” or “too cold” until all our taste buds have been seared away or entropy has warmed or cooled all the respective bowls too a false sense of “just right”…

Sources Cited:

ISIS-2 Collaborative Group. Randomised Trial of Intravenous Streptokinase, Oral Aspirin, Both, or Neither Among 17,187 Cases of Suspected Acute Myocardial Infarction: ISIS-2 The Lancet – 13 August 1988 ( Vol. 332, Issue 8607, Pages 349-360 )
The Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO-IIb) Investigators. A comparison of recombinant hirudin with heparin for the treatment of acute coronary syndromes. N Engl J Med. 1996;335:775–782
A David Mendelow, Barbara A Gregson, Elise N Rowan, Gordon D Murray, Anil Gholkar, Patrick M Mitchell, for the STICH II Investigators, Early surgery versus initial conservative treatment in patients with spontaneous supratentorial lobar intracerebral haematomas (STICH II): a randomised trial, The Lancet, Volume 382, Issue 9890, 3–9 August 2013, Pages 397-408
AD Mendelow, BA Gregson, HM Fernandes, for the STICH investigators. Early surgery versus initial conservative treatment in patients with spontaneous supratentorial intracerebral haematomas in the International Surgical Trial in Intracerebral Haemorrhage (STICH): a randomised trial Lancet, 365 (2005), pp. 387–397
AD Mendelow, BA Gregson, PM Mitchell et al. Surgical Trial in Lobar Intracerebral Haemorrhage (STICH II) Protocol. Trials, 12 (2011), p. 1
GD Murray, D Barer, S Choi et al. Design and analysis of phase III trials with ordered outcome scales: the concept of the sliding dichotomy J Neurotrauma, 22 (2005), pp. 511–517
HUKKELHOVEN, C.W., STEYERBERG, E.W., FARACE, E., HABBEMA, J.D., MARSHALL, L.F., and MAAS, A.I. (2002). Regional differences in patient characteristics, case management and outcomes in traumatic brain injury: experience from the tirilazad trials. J. Neurosurg. 97, 549–557.
Tyler W. Barrett, David L. Schriger. Clinical Prediction Rules: Answers to the November 2009 Journal Club. Annals of Emergency Medicine – April 2010 (Vol. 55, Issue 4, Pages 380-389
BA Gregson, GD Murray, PM Mitchell, EN Rowan, AR Gholkar, AD Mendelow. Update on the Surgical Trial in Lobar Intracerebral Haemorrhage (STICH II): statistical analysis plan. Trials, 13 (2012), p. 222