The practice of Frequentist statistics is often a study in extremes. Based on an arbitrary threshold of significance, we are asked to interpret data as either positive or negative when in reality it merely shifts our probability of certainty. Even more important, because of the singular nature of Frequentist statistics, our interpretation of data is often constrained to the questions posed by those designing the trial. Although a strict deductive methodology is important to prevent mistaking random chance for scientific proof, it is equally important to understand in which instances abiding by these laws will lead to a misinterpretation and misunderstanding of the data.
Appendicitis has long been considered a surgical emergency. If it is not intervened upon surgically in a timely fashion the pathological sequelae will lead to perforation, sepsis, and death. And yet, despite this foregone conclusion, a number of trials have challenged the necessity of cold steel in the management of acute appendicitis. Most recently, in JAMA, Salminen et al published the findings from their RCT comparing the traditional surgical management of acute appendicitis to conservative treatment with antibiotic therapy alone (1). Despite the authors’ primary conclusion, this trial demonstrated that in patients with non-complicated acute appendicitis, the use of antibiotic therapy is anything but inferior.
Salminen et al randomized 530 patients with CT confirmed non-complicated acute appendicitis to either surgical management using primarily open laparotomy, or a short course of IV antibiotics (3-days of ertapenem), followed by a 7-day course of oral levofloxacin. Of the 273 patients randomized to the surgical group, 272 (99.6%) underwent successful appendectomy. In the patients randomized to conservative therapy 70 patients (27.3%) underwent appendectomy within one year of initial presentation. Lets pause for a moment. A disease process, which for the past century has been considered a surgical necessity, with 72.7% of patients treated successfully with antibiotics alone (1). Despite these impressive numbers the trial was deemed unsuccessful as the rate of “treatment failure” in the conservative group crossed the predetermined non-inferiority margin of 24%. And yet these statistical inadequacies are based less on the inferiority of antibiotic therapy and more on the authors’ unfortunate choice of how exactly they defined “non-inferior”.
Non-inferiority trials are intended to ask a very specific question. Whether a new treatment strategy or medical intervention is comparable to the traditional standard therapy. Rather than examine the two in the hopes of determining superiority, a non-inferiority trial merely attempts to establish this new treatment is no worse than the current standard care. This type of trial is undertaken when the new treatment provides certain advantages that would make it preferable over the old treatment (2,3). For example if it is cheaper, safer, or less invasive one might prefer to use this new treatment rather than expose the patient to the cost, risk, or intrusive nature of the prior strategy. In fact depending on what advantages a new treatment may provide, one might accept some degradation in efficacy as long as it does not cross a predefined threshold for inferiority. This threshold is based upon a number of assumptions. First, what is the proven efficacy of the established standard? Say for example, this standard in previous studies demonstrated an absolute decrease in mortality of 5%. The confidence interval surrounding this point estimate ranges from 3%-7%. You would not want your new intervention to be 3% less effective than the standard comparator, in which case it would prove to be as beneficial as placebo. Second, what added benefits does this new therapy provide? If these advantages are impressive, then you may accept a greater degree of inferiority when compared to the standard treatment strategy (a lower non-inferiority margin). On the other hand, if this new treatment provided few novel advantages, you would likely accept far less deviation from the standard treatment’s efficacy.
Salminen et al utilized neither of these considerations when calculating their non-inferiority margin. In fairness to the authors, it would be exceedingly difficult to accurately access the true efficacy of surgery over placebo as this standard of care was established long before placebo control trials were utilized to define treatment effect. Where the authors did falter was the manner in which they determined their non-inferiority margin and performed their power calculation. Using data from prior studies examining the efficacy of antibiotic therapy in acute appendicitis, the authors estimated a 25% rate of treatment failure (defined as need for surgical intervention within one year of initial presentation) in the patients randomized to conservative treatment (1). Using this estimate they set their non-inferiority margin at no more than 24% treatment failure in patients randomized to antibiotic therapy, essentially dooming their trial from its earliest power calculations.
Non-inferiority trials ask a different question than the traditional superiority trials that we are more accustomed. Rather than presenting a null hypothesis that states there is no difference between the groups, the non-inferiority trial design operates under the assumption that the novel intervention is inferior to the standard treatment. The alternative hypothesis states that the treatment options are equivalent. In order to reject the null hypothesis the novel treatment must demonstrate a near equivalent efficacy within a degree of certainty. This means that both the point estimate and surrounding confidenceintervals must fall above the non-inferiority margin (2,3). In this case, despite all prior evidence demonstrating the contrary, the authors estimated that 275 patients per group would provide a 90% power to demonstrate the non-inferiority of conservative management for acute appendicitis when compared to the more traditional surgical intervention. Essentially this translates into the non-surgical group having to demonstrate a point estimate of approximately 20% treatment failure within one year for the lower end of the confidence interval not to cross their predefined non-inferiority margin. Further hampering their efforts, the authors halted the trial early after enrolling only 530 patients (rather than the 610 planned in the original power calculation), increasing the already wide confidence interval surrounding their point estimate (1).
It should have come as no surprise that the authors failed to demonstrate non-inferiority by their designated definition. The authors found that 27% of patients randomized to antibiotic therapy required an appendectomy within 1-year of initial presentation. The 95%-confidence interval surrounding this point estimate was 22.0% to 33.2% (1). In the two trials in which they used to justify their non-inferiority margin of 24%, the 1-year failure rate in patients treated with antibiotics was cited as 24% and 23.6% respectively (4,5). Unfortunately in the latter of these to trials by Hannson et al, this failure rate was calculated from the per-protocol analysis rather than the intention to treat analysis. In reality the antibiotic group had a 47.5% crossover rate to surgery. The overall failure rate in the intention-to-treat analysis was 60% (5). In an additional trial by Vons et al, published in the Lancet in 2011, the 1-year appendectomy rate was 37%. The 95%-confidence interval around this point estimate ranged form 28.36% to 45.64% (6). The 2011 Cochrane analysis after examining the 5 existing RCT trials found 26.6% (95%-confidence interval 18.1%- 37.3%) of the patients randomized to antibiotic therapy went on to have an appendectomy within 1-year of initial presentation (7). Given that the previous evidence indicates that the rate of antibiotic failure has consistently been greater than 25% and has ranged as high as 60%, the expectation by Salminen et al that they would find non-inferiority of antibiotic therapy with a non-inferiority margin of 24% was optimistic to say the least.
More importantly was appendectomy rate at 1-year truly the most appropriate criteria with which to define inferiority? This trial was not negative because medical management proved to be inferior to surgical appendectomy, rather it was negative because the authors asked the wrong question. As clinicians what is our concern with the medical management of acute appendicitis? It is not whether 20% or 27% of those initially treated with antibiotics will eventually require an appendectomy, but rather does medical therapy lead to an unacceptably high rate of serious complications? In fact if we were to be completely equitable, while 99.6% of the patients in the surgical arm of this trial underwent appendectomies, only 27% of the patients in the medical management arm were exposed to an invasive procedure. The question the authors should have asked was, “How many patients in each arm experienced resolution of symptoms related to acute appendicitis without experiencing acute complications related to delays in treatment (perforation, abscesses, sepsis, etc)?” If the authors had asked this question their answer would have been entirely different. Among the patients randomized to medical management, of the 257 patients, 15 (5.8%) required appendectomy during their initial hospital admission. Only 5 (1.9%) patients in the antibiotic group experienced perforations requiring surgical intervention, compared to 2 out of 273 (0.7%) patients randomized to an immediate surgical intervention (1). Essentially you would have to treat 100 patients with non-complicated acute appendicitis in order to prevent one perforation.
Certainly there is a great deal to be determined before this non-invasive strategy can be considered mainstream practice. This was a small underpowered cohort in which the participating surgeons performed primarily open laparotomies. How this strategy translates to the US where the primary approach to appendectomies is laproscopic intervention, is unclear. Additionally, whether patients require 3 days of broadspectrum IV therapy followed by a 7-day course of oral therapy is unknown. What seems obvious is in what was once considered an exclusively surgical disease, the majority of patients can effectively be managed conservatively. Despite not meeting their own high standards for non-inferiority, the authors demonstrated that for most patients with acute appendicitis, when treated conservatively with antibiotics we can avoid surgical intervention without complications of delays to definitive care. To define such a revelation as inferior is unjust indeed.
- Salminen P, Paajanen H, Rautio T, et al. Antibiotic Therapy vs Appendectomy for Treatment of Uncomplicated Acute Appendicitis: The APPAC Randomized Clinical Trial. JAMA. 2015;313(23):2340
- Kaji AH, Lewis RJ. Noninferiority Trials: Is a New Treatment Almost as Effective as Another?. JAMA. 2015;313(23):2371-2.
- Kaul S, Diamond GA. Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials. Ann Intern Med. 2006;145:62-69
- StyrudJ,ErikssonS,NilssonI,etal. Appendectomy versus antibiotic treatment in acute appendicitis: a prospective multicenter randomized controlled trial. World J Surg. 2006;30(6):1033-1037.
- HanssonJ,KörnerU,Khorram-ManeshA, Solberg A, Lundholm K. Randomized clinical trial of antibiotic therapy versus appendicectomy as primary treatment of acute appendicitis in unselected patients. Br J Surg. 2009;96(5):473-481.
- VonsC,BarryC,MaitreS,etal.Amoxicillinplus clavulanic acid versus appendicectomy for treatment of acute uncomplicated appendicitis: an open-label, non-inferiority, randomised controlled trial. Lancet. 2011;377(9777):1573-1579.
- Wilms IM, De hoog DE, De visser DC, Janzing HM. Appendectomy versus antibiotic treatment for acute appendicitis. Cochrane Database Syst Rev. 2011;(11):CD008359.