EM Nerd-A Case of Central Tendencies

The REACT-2 trial is an RCT comparing two different diagnostic strategies in a cohort of patients presenting to the Emergency Department with severe trauma (1). And while the authors found no difference in clinical outcomes between patients randomized to total-body vs selective imaging strategies, they did report a statistically significant difference in the total dose of radiation exposure between the two imaging strategies.

The question that has come up frequently since this trial’s publication is, how clinically relevant is this statistical difference? The median radiation received during the initial resuscitation was 20.9 mSv in the total-body imaging group and 20.6 mSv in the selective imaging group (p< 0.0001). Despite its impressive p-value how clinically important is a 0.3 mSv difference? To answer this question, a brief discussion on the use of central tendencies and the descriptive statistics utilized to analyze them is necessary.

Traditionally when trying to determine if there is a difference in continuous variables between two cohorts, we perform a comparison of central tendencies(2). The standard t-test compares two mean values and estimates the probability that the observed difference between these values occurred by chance alone (fig. 1). This likelihood is represented by a resulting p-value. But in order to perform a t-test, the dispersion of values surrounding the two means must follow a standard distribution. You can imagine if a sample was not distributed in a normal parabola-like contour, two very similar means could represent two very different cohorts(fig. 2).

Screen Shot 2016-08-05 at 3.06.17 PM — fig. 2

In the REACT-2 data set, the median radiation dose in the respective groups was very similar (20.9 mSv vs 20.6 mSv in the total-body vs selective scanning group)(1). If one were to compare these central tendencies alone, no clinically important difference in imaging strategies would be observed. But a comparison of central tendencies does not describe the distribution of data around these median values(3). In the total-body CT group, the surrounding data points lie very close to the median value, with an inter-quartile range (IQR) of 20.6-20.9 mSv. This inherently makes sense as the entire group received the CT imaging independent of their presenting complaints. Whereas in the selective scanning group, which only received imaging deemed clinically necessary by the treating physician, the IQR includes a far broader set of values (9.9-21.1 mSv). A simple comparison of median values would not truly convey the differences in radiation exposure.

When analyzing the radiation exposure, the authors did not directly compare central tendencies, but rather utilized a non-parametric analysis, the Mann-Whitney U test. While this test uses the median values, unlike the t-test it is not a direct comparison of central tendencies. Rather the Mann-Whitney U test is both a comparison of central tendencies and the distribution of surrounding values. The p-value reported from a Mann-Whitney U denotes the probability that any randomly selected value from the population with the larger median value is greater than any randomly selected value from the other population(3). Hence the p-value of <0.0001 reported in the REACT-2 cohort denotes that any randomly selected patient from the total-body CT group is far more likely to have been exposed a higher dose of radiation than any randomly selected patient from the selective imaging group. In fact, 45% of the patients in the selective scanning group received a lower total dose of radiation than the lowest dose of radiation (20 mSv) received by any patient in the total-body CT group.

Screen Shot 2016-08-05 at 3.06.20 PM — fig. 3

Schriger et al discussed the limitations of descriptive statistics in a 2001 article published in the Annals of Emergency Medicine entitled “Achieving Graphical Excellence: Suggestions and Methods for Creating High-Quality Visual Displays of Experimental Data”. In this paper the authors promote a graphical depiction of data rather than the more commonly employed statistical descriptive methods. Take for example two very different graphical depictions of the REACT-2 dataset*. Fig.3 presents the median
radiation exposure reported in both the total-body and selective imaging groups. Despite the fact that all the data from the published paper is represented in this table, it does not enhance our understanding of the differences between these two groups. Conversely if the data was presented in the fashion of fig 4, it is easy to observe that while the median radiation doses between the two groups are similar, the distribution of the data surrounding these central tendencies are distinctly different.

Screen Shot 2016-08-05 at 3.46.38 PM — fig. 4

We utilize descriptive statistics in an attempt to reduce the complexity of an entire dataset down to a few select informative values. But often such reductivism cannot be undertaken without a critical depreciation of the underlying data. As is the case with the REACT-2 dataset. Patients randomized to the total-body imaging strategy were frequently exposed to larger doses of radiation than those in the selective imaging group. This excess radiation exposure is not readily conveyed by the descriptive methods employed by the trials authors, but represents a true and clinically important harm imparted on patients who undergo empiric total-body irradiation strategies.

Sources Cited:

Sierink, Joanne C et al. Immediate total-body CT scanning versus conventional imaging and selective CT scanning in patients with severe trauma (REACT-2): a randomised controlled trial. Lancet June 28^th 2016
Sedgwick Philip. Parametric v non-parametric statistical tests BMJ 2012; 344 :e1753
Hart A. Mann-Whitney test is not just a test of medians: differences in spread can be important. BMJ. 2001;323(7309):391-3.
Schriger DL, Cooper RJ. Achieving graphical excellence: suggestions and methods for creating high-quality visual displays of experimental data. Ann Emerg Med. 2001;37(1):75-87.

*These graphs are not intended to be actual representations of the REACT-2 trial’s results as I do not have access to the dataset in its entirety. Rather they are meant to illustrate the concepts discussed in this post.