So, Gilead’s first RCT on remdesivir was just published, and it’s very interesting.1 Gilead’s, you say? Yep. The study was designed, monitored, analyzed, and written by Gilead:
Before getting into the study, let’s take a moment and think about what Gilead’s first RCT could look like. Gilead knows more about remdesivir than anyone (they built it). So, their RCT ought to be a tour de force. More than anyone else, they ought to know which patients to select, what dose to use, which endpoints to evaluate, and how to present the results. If remdesivir were highly effective, this study should have been a slam dunk.
preamble: some interesting bits about the study design
Since the study is designed by Gilead, the design itself reveals some interesting bits about what the company truly believes about remdesivir. Some issues bear particular mention.
(1) Lack of placebo control
The most unusual aspect of this study is lack of a placebo arm. Why? One might imagine roughly two possibilities:
- Excess optimism: Gilead assumed that remdesivir would be a wonder-drug that obviously works, thereby obviating the need for a large placebo-controlled trial. Or perhaps Gilead assumed that the Wang et al. study would establish the efficacy of remdesivir (a study which was actually rather neutral).2 Either way, Gilead was banking on the assumption that the drug would clearly work, so all they needed to do was establish the appropriate dose.
- Lack of confidence: Gilead didn’t have much confidence that remdesivir is the cure, but they were hoping that it would get accepted anyway (out of haste and desperation). In this scenario, testing remdesivir against placebo would risk exposing remdesivir as ineffective.
My guess is scenario #1. Regardless, lack of placebo controls makes this study difficult to interpret.
(2) Changing the primary endpoint
The original primary endpoint of the study was normalization of temperature and oxygen saturation through day 14. This was changed to assessment of clinical status using a 7-point ordinal scale on March 15 (before data was available). An ordinal scale is a more sensitive metric for small improvements in outcome. This switch suggests that Gilead may have been recognizing that remdesivir was less effective than they initially thought.
(3) Lack of secondary endpoints
Most RCTs suffer from the opposite problem – excessive secondary endpoints (leading to statistical problems due to the likelihood that one secondary endpoint is positive due to chance alone).
This study design has only a single pre-specified secondary endpoint (a safety endpoint). This is quite unusual. In a way, it suggests that perhaps Gilead was avoiding really kicking the tires here – they didn’t want to look too closely into what was going on with this study.
(4) Exclusion of patients with GFR <50 ml/min
It remains controversial whether remdesivir is nephrotoxic. Prior studies have quietly excluded patients with GFR <30 ml/min (an exclusion criteria which has been largely ignored by the NIH guideline recommendations).
This study excluded any patients with the slightest hint of renal dysfunction (GFR <50 ml/min). This suggests that Gilead is not confident that remdesivir is safe for patients with renal dysfunction (even for patients with borderline renal dysfunction, i.e., GFR 30-50 ml/min).
This is a multi-center, open-label, phase 3 trial comparing two regimens of remdesivir among hospitalized COVID-19 patients (either 5 or 10 days of therapy).
Key inclusion criteria were:
- Oxygen saturation 94% or lower on room air
- Radiologic evidence of pneumonia
- PCR assay within four days of randomization
- Not intubated or on ECMO, nor in multi-organ failure
- AST or ALT not above 5 times the upper limit of normal
- Glomerular filtration rate >50 ml/min (by the Cockcroft-Gault equation)
- Age >11 years old
- Women included only if not pregnant. Both men and women were required to use contraception (if relevant).
Here is where things start getting controversial. The two groups are fairly similar, although slightly more patients in the 10-day group required intubation or noninvasive respiratory support (69 vs. 53):
How big of a difference is this? It’s debatable:
- Using a Fisher’s exact test, 69/197 vs. 53/200 isn’t statistically significant (p = 0.08).
- Using a Wilcoxon rank sum test, the difference is statistically significant (p = 0.02).
- Incidentally, the use of any statistical test here is arguably invalid (we know that the patients were randomized, so we should already know that the null hypothesis is true!).
As shown below, patients receiving longer courses of remdesivir did worse by a variety of different metrics:
Are these differences significant? Well, that’s debatable. In an unadjusted analysis, patients in the 5-day group did better. However, an adjusted analysis based on initial illness severity shows no statistically significant difference:
In a randomized controlled trial, randomization should ideally eliminate baseline differences between patient groups. So generally, adjustment based on baseline variables is unnecessary. However, adjustment of RCTs based on baseline differences is occasionally performed. For example, this might be appropriate in the following situations:
- Recruitment of an adequate sample size is difficult. Pre-planned adjustment for baseline characteristics could help remove confounding variables, thereby improving the power of the study.
- There is an unexpected difference in baseline characteristics between groups, due to bad luck. Post-hoc adjustment could be used to estimate the impact of this imbalance.
Use of an adjusted statistical analysis here seems a little dubious. It feels a bit like patients receiving remdesivir for longer courses did worse, so the authors are covering this up with some statistical wizardry.
post-hoc subgroup analysis
Post-hoc subgroups were evaluated to see if there might be any patient population where giving more remdesivir could be helpful. Well, run enough statistical tests on enough subgroups and…
Of patients on invasive mechanical ventilation, those treated with 10 days of remdesivir had lower mortality (7/41 vs. 10/25, p=0.048). There are a few reasons that this analysis isn’t valid. First, considering the multiplicity of comparisons in this post-hoc subgroup analysis, a p-value of 0.048 isn’t exciting. Second, these subgroups were generated based on clinical status on day #5 – five days after patients had started therapy! They essentially re-drew the starting line for the race, several days into the study! This is wild – you shouldn’t initiate a therapy, wait several days for some patients to deteriorate, and then initiate a subgroup analysis.
Patients treated with longer courses of remdesivir had higher rates of serious adverse events, especially renal failure:
The authors attempt to explain away these differences on the basis of baseline imbalance between the two patient groups (again). That’s possible, but the creatinine values were essentially identical at baseline. Furthermore, even when performing an adjusted analysis which takes into account baseline differences in disease severity, there was still a significant increase in serious adverse events among patients receiving longer courses of remdesivir:
Prior RCTs on remdesivir in COVID-19 have not reported increased rates of renal failure, so this could very well be a statistical anomaly. However, it remains concerning.
- This is a trial designed, monitored, and written by Gilead. In some ways, the design of the trial and its missing parts are more notable than what is actually reported in the study (e.g., secondary endpoints, viral load data).
- Patients treated with longer courses of remdesivir (10 days vs. 5 days) had worse outcomes. It’s unclear whether this is due to baseline imbalance between the groups, or toxicity from remdesivir.
- Patients were included in the study only if they had a GFR >50 ml/min, suggesting that Gilead might lack confidence regarding whether remdesivir is safe in patients with renal dysfunction. There were higher rates of kidney injury among patients receiving longer courses of remdesivir.
- If a decision is made to use remdesivir, it should be limited to a 5-day course.
- Lack of a placebo group prevents this study from evaluating whether or not remdesivir works. However, the study’s construction and results do raise some red flags. Given that prior placebo-controlled RCTs of remdesivir have failed to demonstrate durable clinical benefit, further placebo-controlled trials are required prior to concluding that remdesivir provides meaningful benefit in COVID-19.
related data on remdesivir
- NIAID ACTT-1 trial
- Wang trial in Lancet
- NEJM “compassionate use” study
- COVID AKI chapter at NephJC (multi-author awesomeness).
- 1.Goldman JD, Lye DCB, Hui DS, et al. Remdesivir for 5 or 10 Days in Patients with Severe Covid-19. N Engl J Med. Published online May 27, 2020. doi:10.1056/nejmoa2015301
- 2.Wang Y, Zhang D, Du G, et al. Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. The Lancet. Published online May 2020:1569-1578. doi:10.1016/s0140-6736(20)31022-9