"The Adventure of the Golden Standard"

We have all been told ghost stories and fairy tales. Campfire fables intended to frighten the gullible populace into behaving in a manner deemed appropriate. Even in Emergency Medicine we have our fair share of ghost stories. Most notably we are taught from an early age to fear and respect the clinically occult pulmonary embolism. A disease process so cryptic in nature it can go undetected throughout a patient’s Emergency Department stay and yet is deadly enough to strike a patient down shortly after their discharge. Though such a monster exists at least anecdotally it certainly does not strike with the alacrity these tales would have you believe. Like with any evil spirit that cannot be detected through normal measures, we have developed our own set of wards and charms in the hopes of keeping this demon at bay. One of our more frequently (over)used charms of this type is the serum D-Dimer. Armed with its protection we go to work everyday ready to battle the mythical beast that is the clinically occult pulmonary embolism.

A recent publication in JAMA, by Righini et al sought to expand D-Dimer’s role in the eradication of venothromboembolism (VTE)(1). In order to address the poor specificity of D-Dimer, experts have suggested increasing the threshold at which the assay is considered positive. Some have recommended doubling the threshold traditionally considered normal while others propose using an age-adjustment to account for the natural increase in serum levels with aging. Most of the data examining these strategies is retrospective in nature (2), and until this recent JAMA paper we had no prospective literature validating its efficacy. Righini et al examined the age-adjusted strategy using a level of 10 multiplied by the patient’s age (in years) as their threshold for a positive D-Dimer. Patients whose D-Dimer level was below their age adjusted threshold had no further testing performed, while those above this threshold went on to more definitive testing. Using a gold standard of PEs diagnosed by CT Pulmonary Angiogram(CTPA), V/Q scan or 3-month follow up, the authors examined the age-adjusted approach. The authors claim a missed VTE rate at 3-month follow up of 0.3%. Additionally, employing this age-adjusted threshold in low-risk patients over 75 years of age increased the specificity of the assay from 6.9% to 29.7%. Seemingly a landmark trial, this publication should reduce testing and allow for D-Dimer to be more clinically applicable in an older population. Unfortunately this paper’s success may just as likely be due to a low risk cohort, an imperfect gold standard and a limited definition of clinically positive events during follow up.

Though D-Dimer has experienced some degree of success in the recent literature, it has not always garnered such favor. In fact D-Dimer never achieved the diagnostic accuracy necessary for universal clinical use. Even the most sensitive assays were found to be incapable of safely ruling out pulmonary embolism in an undifferentiated cohort of patients suspected of having a PE (3,4,5,15). In one of the few trials that randomized hospital wards to encourage use of D-Dimer, compared to control wards where D-Dimer testing was discouraged, authors found that widespread utilization did the exact opposite of what it was intended. Not only did the evaluation of PE nearly double in the experimental arm compared to the control, the number of V/Q scans increased. Even more surprising, while the experimental arm diagnosed and treated significantly more patients for PE (160 vs 94) there was no difference in 3-month mortality or recurrent VTE (6). And yet despite these obvious flaws we could not let go. The physiological reasoning and clinical convenience of such a test were too attractive to abandon this assay as a failure. Instead we adapted our patients to fit the test. With a few minor adjustments of incidence, a small modification in the gold standard and a certain amount of looking the other direction when it came to clinical follow up, the D-Dimer was transformed into a highly sensitive assay capable of ruling out PE and reducing invasive testing.

We know from early studies of D-Dimer assays its sensitivity is only sufficient to rule out PE in cohorts in which the pre-test probability is around 10-15% (3,15). Traditionally this was accomplished by using a low-risk Wells score of 2 or less. This strategy was first validated in a study by Wells et al published in Annals of Internal Medicine in 2001 (5), in which the authors hypothesized that using a low-risk Wells score of 0-2 in conjunction with a D-Dimer assay would reduce further downstream testing. The overall incidence of pulmonary embolism in this cohort was 9.5%. As expected D-Dimer performed admirably in such a low-risk cohort. The overall negative predictive value was 97.3%, which was powered primarily by the scarcity of disease in the low-risk group (1.3%). In fact when the test was used in the moderate and high-risk groups its negative predictive value fell to 93.9% and 88.5% respectively. The overall sensitivity of the D-Dimer in the entire cohort was only 78.5%. Such statistical machinations are relevant because the success of D-Dimer in the modern literature is driven largely in part by the utilization of negative predictive value in combination with low-risk cohorts to overestimate D-Dimers diagnostic capabilities. This acceptance of the negative predictive value as the endpoint of significance has tainted the literature examining D-Dimer’s effectiveness. Though Wells et al were forthright in reporting the true test characteristics of D-Dimer, later studies have not been so transparent. Most notably was the validation cohort, published by the Christopher group in 2006 (7). In this cohort the authors set out to demonstrate that patients with Wells scores up to 4 could be safely excluded using a D-Dimer. Similar to the Wells et al cohort these authors used 3-month VTE event rate in patients discharged with a negative D-Dimer. Unlike the Wells cohort, patients with Wells score less than 4 and a negative D-Dimer had no further testing. Authors claim success, emphasizing that the 3-month event rate in the negative D-Dimer group was only 0.5%. Again this negative predictive value is powered by the low incidence of disease in this cohort (12.1%). The actual sensitivity in this subgroup was 95%. This pattern is consistent throughout the PE literature. The incidence of pulmonary embolism in prospective cohorts has been progressively decreasing over the past few decades. In the original PIOPED cohort, published by Stein et al in 1990, the high-risk, intermediate-risk and low-risk groups had a rule-in rate of 68%, 30% and 9% respectively (8). In contrast the PERC validation cohort, published in 2008 by Kline et al, had a rule-in rate of 31.1%, 10.4%, and 3% respectively. Obviously this decrease in incidence is due to our dwindling comfort with risk tolerance and the subsequent inclusion of a far lower risk patient population into the diagnostic pathway. This dilution of the disease state and the focus on negative predictive value as the metric of choice provides a false impression of D-Dimer’s capabilities. This test appears to be safe for use in moderate-risk patients when in reality very few moderate-risk patients have been included in these cohorts.

The second major flaw in the modern literature of D-Dimer is the gold standard used to define these thromboembolic events. Most notably for our current discussion is the utilization of CTPA as the gold standard test for diagnosing PE and the discrete, yet real, increase in overdiagnosis that has resulted from its adoption. The reclassification of clinically insignificant clot burden to a pathological state not only leads to overtreatment, transforming healthy people into patients, but it also makes it incredibly difficulty for us to assess the effectiveness of any diagnostic pathway. To understand the repercussions the adoption of CTPA as the accepted gold standard has had on clinical research we must first address its limitations. In PIOPED II, the largest trial examining the diagnostic characteristics of CTPA, published in the NEJM in 2006, Stein et al found that in patients with low-risk of pulmonary embolism by clinical assessment, the CTPA diagnosed far more PEs than the composite reference standard (Normal DSA or V/Q scan, a low probability V/Q with Wells <2 or a negative LE US). In fact, in patients with a Wells <2, 42% of the PEs diagnosed by CTPA were false positive findings. A significant increase in what would be considered a pulmonary embolism by the standard diagnostic criteria of the day. Conversely in the high-risk patients, the CTPA was not sensitive enough to safely rule out PE. In patients with a Wells score >6, 40% of the negative CTPAs were false negatives (10).

Despite these significant flaws, the CTPA has now become the gold standard by which the D-Dimer is judged against. A gold standard that is prone to overdiagnosing low-risk patients with clinically irrelevant emboli and underdiagnosing high-risk patients with clinically relevant ones. Not only is this is a poor standard to guide clinical judgment, when used as the gold standard comparator it leads to an overestimation of D-Dimer’s utility. Early examination of the accuracy of various D-Dimer assays found at best a moderate ability to rule out PE. When pre-CTPA gold standards were used (DSA, V/Q scan and serial US) in a high-risk patient in whom PE is suspected, a negative D-Dimer is not sufficient to rule out disease (5). In such cohorts only patients with Wells <2% could a D-Dimer be utilized to rule out PE. And so a portion of PEs in moderate-risk patients that would be missed by the more traditional composite endpoint are also in turn missed by the CTPA. This overestimates the sensitivity of the D-Dimer assay. Similar to D-Dimer, the CTPA tends to overdiagnose pulmonary embolism in the low-risk patient. This helps mask the true extent of D-Dimer's poor specificity. Overall the CTPA is a gold standard designed to present an overly optimistic view of the D-Dimer assay.

The Righini et al trial committed all of the aforementioned errors in their examination of age-adjusted D-Dimer thresholds. Though the overall incidence of PE was high by modern standards (18.7%), the authors did not specifically state the incidence of PE in the subgroup in which D-Dimer was used to rule out disease and thus it is hard to determine how the acuity level of the cohort affected the negative predictive value. The only criteria available for us to judge the acuity of each subgroup is the quantity of patients stratified to each respective risk group. In the Righini study only 12.8% of the patients had a Wells score greater than 4 (1). In contrast, the Christopher cohort had 33.2% of their patients with a Wells score greater than 4(7). The mortality in the high-risk group following a negative CTPA at 3-month follow up was 1.2% in the Righini cohort compared to 8.6% in the Christopher study. This information suggests the Righini cohort is comprised of a far healthier patient population than those in the Christopher trial. Following in the Christopher trialists’ footsteps, the authors used positive findings on CTPA or any event or death during 3-month follow up period deemed due to a VTE (as determined by three independent experts blinded to the patient’s initial diagnostic workup) as their surrogate gold standard. Though the authors claim that only one event was missed at 3-month follow up in the patients discharged from the ED using the age-adjusted threshold, further examination reveals that in fact seven deaths and seven suspected VTEs occurred in this group, only one of which was deemed to VTE-related by the expert panel. Though none of the seven deaths were judged to be related to pulmonary emboli, a number were caused by COPD and end-stage cancer, both of which are easily confused with pulmonary embolism and commonly placed as the default diagnosis on death records (13).

In 1977, Annals of Internal Medicine published an editorial by Dr. Eugene Robinson on the current state of PE management. Though all the diagnostic tests used to differentiate disease from non-disease have changed, the flaws in management have persisted (11). Specifically we continue to obsess over diagnosing clinically unimportant pulmonary embolisms in the young and healthy while simultaneously ignoring the sick vulnerable patients where PE is far more likely and clinically relevant. In August 2013, den Exter et al published an article in Blood supporting Dr. Robinson’s thoughts (12). In this paper the authors examined the factors associated with recurrent pulmonary emboli and mortality in a cohort of 3,728 patients undergoing a work up for PE. The authors found that clot location, clot burden and even identification of clot on CTPA were not important factors when predicting clinical outcomes at follow-up. In fact the mortality during the follow-up period was 10.3% in those with a subsegmental PE vs 6.3% in those with a proximal PE vs 5.2% in those with a negative CTPA. The only factors that demonstrated clinically significant predictive value were history of malignancy, age and history of heart failure. Simply put, elderly patients with comorbidities are at an increased risk for clinically relevant pulmonary emboli. Similarly the Christopher study reported patients who were discharged after a negative CTPA had a mortality rate of 8.6%. No amount of testing can significantly modify this risk. Even those that do not have an embolic event diagnosed during their Emergency Department visit are at significant risk of experiencing an embolic event over the next 3 months. Clot burden, clot location or even presence of a clot on imaging did not predict clinical outcomes, patient variables did.

The D-Dimer assay is one of the many flawed tests in a flawed system built to identify pulmonary emboli in the young and healthy, in whom the diagnosis is rarely of clinical importance. Like the PERC rule, and even to some extent the CTPA, D-Dimer performs best in this young, healthy cohort with low risk of clinical disease. Conversely in the sick and vulnerable high-risk patients, it is rarely negative and even if it is, does not possess the diagnostic qualifications to safely rule out the disease of concern. In fact the only patients in which D-Dimer can be consistently utilized, is the young patient at low risk of pulmonary embolism. We are left with a test that possesses diagnostic characteristics capable of ruling out the presence of pulmonary embolisms of little clinical significance and incapable of ruling out the disease in patients in which we should be truly concerned. Clearly despite its best intentions, D-Dimer adds very little to the diagnostic pathway for PE. Playing with thresholds on the ROC curve does nothing to improve D-Dimer’s test characteristics. Its success dependent on its ability to ward off a fictitious disease in a healthy population that will likely do well no matter what. A test best suited to treat our own fears rather than our patients’ maladies. Surely there is a better way to identify those who require workups for PE. Exactly what this consists of is still unclear, but certainly ghost stories, campfire tales and even D-Dimer assays will provide no assistance.

Sources Cited:

1.Righini M et al. Age-Adjusted D-Dimer Cutoff Levels to Rule Out Pulmonary Embolism: The ADJUST-PE Study. JAMA. 2014;311(11):1117-1124.

2. Schouten HJ et al. Diagnostic accuracy of conventional or age adjusted D-dimer cut-off values in older patients with suspected venous thromboembolism: systematic review and meta-analysis. BMJ 2013;346:f2492

3. Ginsberg JS et al. Sensitivity and Specificity of a Rapid Whole-Blood Assay for D-Dimer in the Diagnosis for Pulmonary Embolism. Annals of Internal Medicine. 1998; 129(12): 1006-1011

4. Stein, PD et al. D-Dimer for the Exclusion of Acute Venous Thrombosis and Pulmonary Embolism. Annals of Internal Medicine. 2004; 140(8) 589-607

5. Wells PS et al. Excluding Pulmonary Embolism at the Bedside without Diagnostic Imaging: Management of Patient with Suspected Pulmonary Embolism Presenting to the Emergency Department by Using a Simple Clinical Model and D-Dimer. Annals of Internal Medicine. 2001; 135(2): 98-107

6. Goldstein NM et al. The Impact of the Introduction of a Rapid D-Dimer Assay on the Diagnostic Evaluation of Suspected Pulmonary Embolism. Arch Intern Med. 2001;161(4):567-571.

7. Writing Group for the Christopher Study Investigators*. Effectiveness of Managing Suspected Pulmonary Embolism Using an Algorithm Combining Clinical Probability, D-Dimer Testing, and Computed Tomography. JAMA. 2006;295(2):172-179.

8. The PIOPED Investigators. Value of the Ventilation/Perfusion Scan in Acute Pulmonary Embolism: Results of the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED). JAMA. 1990;263(20):2753-2759.

9. Kline J.A. et al. Prospective multicenter evaluation of the pulmonary embolism rule-out criteria. Journal of Thrombosis and Haemostasis. 2008; 6(5): 772–780

10. Stein PD et al. Multidetector Computed Tomography for Acute Pulmonary Embolism. N Engl J Med 2006; 354:2317-2327.

11. Robinson ED. Overdiagnosis and Overtreatment of Pulmonary Embolism: The Emperor May Have No Clothes. Ann Intern Med. 1977;87:775-781.

12. den Exter PL et al. Risk profile and clinical outcome of symptomatic subsegmental acute pulmonary embolism. Blood 2013,122(7)1144-114913.

13. Wexelman, BA et al. Survey of New York City Resident Physicians On Cause-Of-Death Reporting. 2010. Prev Chronic Dis. 2013 10:E76

14. Sohne M et al. Accuracy of clinical decision rule, D-dimer and spiral computed tomography in patients with malignancy, previous venous thromboembolism, COPD or heart failure and in older patients with suspected pulmonary embolism. J Thromb Haemost 2006; 4: 1042–6.

15. Gibson NS et al. The Importance Of Clinical Probability Assessment In Interpreting A Normal D-Dimer In Patients With Suspected Pulmonary Embolism. Chest. 2008;134(4):789-793.

16. Righini, Marc et al. Effects of age on the performance of common diagnostic tests for pulmonary embolism. The American Journal of Medicine , Volume 109 , Issue 5 , 357 – 361