A Case of Shadows

In medicine we frequently propagate half-truths and unsubstantiated certainties. Thus, truth is a relative experience, dependent primarily on how we choose to define it rather than any concrete state of reality. Increasingly we have favored a technological definition of truth over that of the clinical perspective. As such we are driven to act in disease states that are often best treated by blissful ignorance. Where we draw the line of clinical relevance and subclinical disease seems dependent on our own comfort with uncertainty. Given this current culture, it is not surprising that bedside ultrasound (US) has become a popular tool to evaluate the majority of ailments that may show up in the Emergency Department. With our expanding technical skills, so to has our comfort in using this modality to make clinical decisions. At this point, such a level of technical proficiency has been achieved that we have outpaced the literature base to guide these decisions. Until recently the majority of the literature addressing bedside US has been limited by its use of surrogate endpoints and disease oriented definitions of success. Thus we stand at a crossroads in Emergency Medicine. This is not intended to discredit bedside US as a modality but rather a commentary on its user, and our inability to separate clinically relevant reality from the pixilated truth we see on our monitors. To ask the question, how exactly should we determine our sonographic definition of truth?

A recent article published in The Lancet Respiratory Medicine, by Laursen et al, is the first randomized controlled trial examining the utilization of bedside US effects on patient outcomes (1). Up until the publication of this article, the efficacy of US was evaluated through studies addressing its diagnostic accuracy. US was compared to a more traditional diagnostic tool often using an impossible gold standard. In many cases US proved comparable or even superior to the traditional diagnostic modality. These types of studies helped us define the potential utility of bedside US, but we have outgrown these humble beginnings. What is now required are trials examining the patient centered effects of the incorporation of bedside US into our practice.

The findings of the Laursen trial were covered in more detail in my previous post found on EM Literature Of Note and examined in an even more expert fashion by Simon Carly on The St. Emlyn's blog. I have included an excerpt from my post as a summation of these findings:

Authors randomized patients presenting to the ED with signs or symptoms concerning for a respiratory etiology to either a standard work up as determined by the treating physician or the addition of POCUS performed by a single experienced operator. The US protocol consisted of sonographic examination of the heart, lungs and lower extremity deep veins to identify possible causes of patients' symptoms. The authors' primary outcome was the percentage of patients with a correct presumptive diagnosis 4 hours after presentation to the Emergency Department as determined by two physicians blinded to ED POCUS findings, but with access to the records of the entire hospital stay.

Using this POCUS protocol the authors found stunning success in their primary endpoint. Specifically, the rate of correct diagnoses made at 4-hours in the POCUS group was 88% compared to 63.7% in the standard work up group. Furthermore 78% of the patients in the POCUS group received “appropriate” treatment in the Emergency Department compared to 56.7% in the standard work up group.

Though promising, these benefits did not translate into improvements in true patient oriented benefits. Though not statistically significant, the observed in-hospital and 30-day mortality trended towards harm in the POCUS arm (8.2% vs 5.1% and 12% vs 7% respectively). Nor was there any meaningful difference in length of stay or hospital-free days between those in the POCUS group and those in the control group. Even more concerning, was the significant increase in downstream testing that occurred in patients randomized to the POCUS group. Specifically the amount of chest CTs (8.2% vs 1.9%), echocardiograms (10.1% vs 3.8%) and diagnostic thoracocenthesis (5.7% vs 0%).

It is important to note the pathologies found in the POCUS group were not false positives. These patients had additional diagnostic tests confirming the validity of the bedside findings. As such this is not a question of technical competency, but rather a question of clinical relevancy. The significant increase in diagnostic proficiency found in the POCUS group did not result in improved patient oriented outcomes, in fact there were significant trends towards harm in both hospital and 30-day mortality. This, of course, may be statistical whimsy. Future trials may show this to be nothing more than the random noise generated by a small sample size , but these findings are concerning for a certain degree of overdiagnosis.

The Laursen trial is not a solitary signal standing out from a crowd of contrary data. There have been signs throughout the US literature demonstrating the potential for over-diagnosis and though not definitive this study certainly supports this hypothesis. When US is compared to CXR for the diagnosis of pneumonia, it reveals far more pathology (2). Does this mean we have been missing a large portion of pneumonias in otherwise well appearing patients or is this an example of overdiagnosis. Likewise US is a far more sensitive modality for identifying pneumothoraxes when compared to CXR (3). And yet like pneumothraxes that are discovered on CT but not seen on CXR, there is question of whether such lesions require any intervention at all. What we do with this information is hard to say. None of these trials are robust enough to draw definitive conclusions. Despite their many flaws surely we can no longer say with overwhelming certainty that ultrasound is free and harmless. As with any other test it is only as good as the practitioners who use it.

A recent article by Kenji et al, published in The Journal of Critical Care, revealed bedside US to be a far more successful tool when used to guide care (4). These authors, utilizing a before and after design, examined the use of bedside echocardiography (echo) to guide resuscitative strategies in ICU patients presenting with pressor-dependent shock. Patients were prospectively evaluated over a 1-year period, the first 6-month being the standard care group and the following 6-month the echo guided group. The standard care group used the “Surviving-Sepsis-Protocol” to guide resuscitation, while the echo-guided group followed a protocol involving evaluation of cardiac function and ICV collapsibility. Echo evaluations where conducted by one of three intensivists with expertise in the use of bedside echocardiography. None of the physicians performing the echo exams were the primary physicians caring for the patients, but rather made recommendations based off their findings. These recommendations were consistent with one of four scenarios:
1. If LV function was normal and IVC full, fluid was stopped and pressors continued
2. If LV function was normal and IVC was collapsible, a fluid bolus of 20-40 ml/kg was administered
3. If LV function was impaired and IVC was collapsible, 10-20 ml/kg was administered and dobutamine was initiated
4. IF LV function was impaired and IVC was full, fluid was restricted and dobutamine was initiated.

The primary outcome the authors examined was 28-day mortality. Secondary endpoints measured were the amount of fluid administered over the first four days of treatment, organ dysfunction and days free of renal replacement therapy. A total of 220 patients were examined (110 in the standard therapy group and 110 in the echo-guided group). The vast majority of the patients evaluated were in vasodialatory shock, followed by a small minority in cardiogenic shock and a handful of patients in mixed or hemorrhagic shock. 25% of the patients in the echo-guided group were found to have severely impaired left ventricular function. Only 35% were deemed to require fluid augmentation as determined by IVC collapsibility. As such, patients in the echo guided group received significantly less fluid over the first day of therapy (49 ml/kg vs 66 ml/kg) and were more likely to be started on dobutamine therapy than those in the standard care group (22% vs 12%).

28-Day mortality was 66% vs 56% in the standard and echo guided groups respectively. This 10% difference reached statistical significance with a P-Value of 0.04. Furthermore patients in the echo-guided group had a more days free of renal replacement therapy (RRT) and less grade 3 acute kidney injury (AKI).

This trial is by no way without its limitations. The before and after design and small sample size, not to mention the questionable efficacy of dobutamine, limit the strength of the conclusions that can be drawn. Despite these drawbacks, like the Laursen et al trial, the Kenji trial sets an important precedence in the US literature. Rather than examining US’s utility using a surrogate disease oriented endpoint both of these trials investigated the effect US had on patient oriented outcomes, specifically mortality.
Though these two trials are examining to very different aspects of bedside ultrasonography, their distinction serves to illustrate our point appropriately. In the Laursen et al trial all patients presenting with respiratory signs or symptoms underwent a protocolized ultrasonographic investigation independent of individual presentations. This shotgun distribution of sound waves is the equivalent of throwing a bunch of labs at a belly pain patient and seeing what sticks. Finding something on US and then retrospectively fitting the patients to these findings will inevitably lead us down many false paths. Kenji et al also used a standardized protocol, but unlike the Laursen trial, they asked a specific clinical question pertinent to the patient’s presentation and used US to answer this question.

As with any form of testing, the acuity of the patient and the pretest probability of disease determine the performance of the investigation. Even the most specific tests, if used on the wrong population will identify more false positives than true disease. It is my belief that bedside US is even more susceptible to these conditional circumstances. In the crashing trauma patient US becomes an invaluable tool to swiftly rule out tension pathology as the cause of the physiological insult (3). Conversely when used in a patient with a more clinically benign presentation the high sensitivity we so recently relied on becomes a detraction as it is now is prone to finding pneumothoraxes of little clinical relevance. Overall the sensitivity of US in the identification of appendicitis is fair, but as the disease process progresses and the clinical suspicion increases the sensitivity of the test becomes far more clinically useful (5). The EFAST Exam when applied to patients with traumatic injury has a poor sensitivity for identifying injury (6,7), but when used to identify the cause of a crashing trauma patient’s hypotension it is clinically invaluable (8). Interestingly in the hypotensive patient where the pretest probability of clinically relevant pathology is extremely high, the potential for overdiagnosis from empirically applying standardized screening protocols such as the EFAST or RUSH exam becomes much less relevant.

How do we move forward? US has been traditionally examined as a diagnostic test, meaning its utility is routinely compared to a gold standard. US studies of pneumonias, pneumothoraxes, appendicitis, or peritoneal injury are commonly evaluated against CT. Bedside echo is typically compared to comprehensive echocardiography as interpreted by an “expert Cardiologist”, and measurements of fluid responsiveness are likened to invasive hemodynamic monitoring. Each of these gold standards possesses their own flaws. CT scans are prone to overdiagnosing (3,4), Cardiologists disagree with each other as often as they disagree with the Emergency Physicians when diagnosing heart failure (9), and invasive hemodynamic measurements used to judge US’s ability to assess fluid responsiveness have not shown to improve patient oriented outcomes when examined clinically(10). We need to utilize patient relevant outcomes when evaluating the use of bedside US in order to assess its true value as a diagnostic tool. Future research should randomize patients with US+, CXR- pneumonias to antibiotic therapy or placebo, compare conservative management to chest tube insertion in patients found to have pneumothoraxes on US but not CXR, and assesse fluid responsiveness in the hemodynamically volatile patient by examining mortality outcomes when US findings are used to guide therapy.

It is an exciting time in the world of point-of-care US. There are great minds with extraordinary vision pushing this field forward everyday. It is a privilege to experience this progression. But as technology advances and the quality of our point-of-care machinery improves, overdiagnosis will become an ever more imperative concern. If we choose to stick our heads in the sand, holding fast to unquestionable certainty found in our pareidolic interpretation of shadows, we will surely redefine medical truth for the worst. Like CTPA once changed the diagnosis of pulmonary embolism from a clinically relevant dangerous disease to a primarily irrelevant disease oriented definition, point-of-care US will identify a large quantity of subclinical disease of questionable clinical bearing. Conversely, if we choose to continue to question the proper application of point-of-care US and focus not only on our procedural expertise but on our medical stewardship we will progress the field of bedside US and improve patient care. If we are to claim clinical expertise our knowledge must extend beyond the technical proficiencies and integrate the wisdom needed to interpret these shadows?

Sources Cited:

Laursen et al. Point-of-care ultrasonography in patients admitted with respiratory symptoms: a single-blind, randomised controlled trial The Lancet Respiratory Medicine – 1 August 2014 ( Vol. 2, Issue 8, Pages 638-646
Bourcier et al. Performance comparison of lung ultrasound and chest x-ray for the diagnosis of pneumonia in the ED. Am J Emerg Med. 2014;32(2):115-8.
Alrajab et al. Pleural ultrasonography versus chest radiography for the diagnosis of pneumothorax: review of the literature and meta-analysis. Crit Care. 2013;17(5):R208.
Kanji et al. Limited echocardiography-guided therapy in subacute shock is associated with change in management and improved outcomes. J Crit Care. 2014;29(5):700-5.
Bachur et al. The effect of abdominal pain duration on the accuracy of diagnostic imaging for pediatric appendicitis. Ann Emerg Med. 2012;60(5):582-590.e3.
Quinn et al. What is the utility of the Focused Assessment with Sonography in Trauma (FAST) exam in penetrating torso trauma?. Injury. 2011;42(5):482-7.
Becker et al. Is the FAST exam reliable in severely injured patients?. Injury. 2010;41(5):479-83.
Laselle et al. False-negative FAST examination: associations with injury characteristics and patient outcomes. Ann Emerg Med. 2012;60(3):326-34.e3.
Januzzi et al. The N-terminal Pro-BNP investigation of dyspnea in the emergency department (PRIDE) study. Am J Cardiol. 2005;95(8):948-54.
Harvey et al. Assessment of the clinical effectiveness of pulmonary artery catheters in management of patients in intensive care (PAC-Man): a randomised controlled trial. Lancet. 2005;366(9484):472-7.