Here at Neuroskeptic, we see a lot of bad science. Maybe, over the years (all 2 of them) that I've been writing this blog, I've become a bit jaded. Maybe I'm less distressed by it than I used to be. Cynical, even.But this one really takes the biscuit. And then it takes the tin. And relieves itself in it: A New Population-Enrichment Strategy to Improve Efficiency of Placebo-Controlled Clinical Trials of Antidepressant Drugs.
Don't worry - it's from a big pharmaceutical company (GlaxoSmithKline), so I don't have to worry about hurting feelings.
It's is full to bursting with colourful graphs and pictures, but the basic idea is very simple. As in "simpleton".
Suppose you're testing a new drug against placebo. You decide to do a multicentre trial, i.e. you enlist lots of doctors to give the drug, or placebo, to their patients. Each clinic or hospital which takes part is a "centre". Multicentre trials are popular because they're an easy way of quickly testing a drug on a large number of patients.
Anyway, suppose that the results come in, and it turns out that the drug didn't work any better than placebo, which unfortunately is what happens rather often in modern trials of antidepressants. Oh dear. The drug's crap. That's the end of that chapter.
...or is it?!? say GSK. Maybe not. They have a clever trick. Look at the results from each centre individually. Placebo response rates will probably vary between centres: in some of them, the placebo people don't get better, in others, they get lots better.
Now, suppose that you just chucked out all of the data from centres where the people on placebo got much better, on the grounds that there must be something weird going on in those ones. They reanalyzed the data from 1,837 patients given paroxetine or placebo, across 124 centres. In the dataset as a whole, paroxetine barely outperformed placebo. However, in the centres where people on placebo only improved a little, the drug was much better than placebo!
Well, of course it was. Imagine that the drug has no effect. Some people just get better and others don't. Let's assume that each person randomly gets between 0 and 25 better, with an equal chance of any outcome. Half are on drug and half are on placebo, but it makes no difference.Let's further assume that there are 50 centres, with 20 people per centre (1000 people total). I knocked up a "simulation" of this in Excel (it took 10 minutes). Here's what you get:
The blue dots show, for each imaginary centre, drug improvement vs. placebo improvement. There's no correlation (it's random), and, on average, there is no difference: both average out at 12 points. The drug doesn't work.The red dots show the "Treatment Effect" i.e. [drug improvement - placebo improvement]. The average is 0 - because the drug doesn't work. But there's a strong negative correlation between Treatment Effect and the placebo improvement - in centres where people improved lots on placebo, the drug worked worse.
This is exactly what Glaxo show in Figure 1a (see above). They write:
The analysis of the surface response indicated the predominant role of center specific placebo response as compared with the dose strength in determining the Treatment Effect of paroxetine.But of course they correlate. You're correlating placebo improvement with itself: the "Treatment Effect" is a function of the placebo improvement. It's classic regression to the mean.
Of course if you chuck out the centres where people on placebo do well (the grey box in my picture), the drug seems to work pretty nicely. But this is cheating. It is cherry-picking. It is completely unscientific. (To give the authors their due, they also eliminated the centres where the placebo response was very low. This could, under some assumptions, make the analysis unbiased, but they don't show that this was their intention, let alone that it would eliminate all of the bias.)
The authors note that this could be a source of bias, but say that it wouldn't be one if it was planned out in advance: "in order to overcome the bias risk, the enrichment strategy should be accounted for and pre-planned in the study protocol." This is like saying that if you announce, before playing chess, that you are going to cheat, it's not cheating.
To be fair to the authors, assuming the drug does work, this method would improve your chances of correctly detecting the effect. Centres with very high placebo responses quite possibly are junk. Assuming the drug works.
But if we're assuming the drug works, why are we bothering to do a trial? The whole point of a trial is to discover something we don't know. The authors justify their approach by suggesting that it would be useful for drug companies who want to do a "proof-of-concept" trial to find out whether an experimental drug might work under the most favourable conditions, i.e. whether they should bother continuing to research it.
They say that such trials "are inherently exploratory in their conception, aimed at signal detection, open to innovation..." - in other words, that they're not meant to be as rigorous as late-stage trials.
Fair enough. But this method is not even suitable for proof-of-concept, because it would (as I have shown above in my 10 minute simulation) increase your chance of finding an "effect" from a drug that doesn't work.
Whatever the truth is, this method will give the same result, so it's not useful evidence. It's like saying "Heads I win, tails you lose". You've set it up so that I lose - the coin toss doesn't tell us anything.
All of the author's results are based on trials in which the drug "should have worked": they do not appear to have simulated what would happen if they used this method on trials where it didn't work, as I just did. So I'm doing Pharma a big favour by writing this post, because if they adopt this approach, they're more likely to waste money on drugs that don't work.
They should be paying me for this stuff.
29 comments:
You have to love the tenacity of the drug companies. They will never give up devising ways to bamboozle the public and physicians. My only question is why would a journal would publish this nonsense and thereby give it an air of scientific credibility?
So Neuro you shoot down my comments (for reasons of "incorrectness" I surmise) but you welcome the nutbars like veri and DM?
Anonymous: No I don't shoot down any comments. Blogger sometimes does, if it thinks they are spam, I think including too many links is what causes that.
I might be missing something here, but if you suspect there is considerable between-clinic variation, the right way to go would to fit a multilevel model with clinics as a random effect. If one of the clinics is a clear outlier (showing a hugely different pattern than the others) you might want to chuck it away completely. But certainly not 10% of your data.
No I don't shoot down any comments. Blogger sometimes does,
That's VERY INTERESTING if Blogger did indeed shot down my last reply to veri in the "The Rise of the Mouse" thread (which would have been comment 20).
It had only one link, I was trying to scare off veri by presenting myself as a devil's employee and I cannot see any possibility to have that kind of censorship automated because the link was to an image!!!
And the comment did show up for a while, so someone took it off.
Big brother cares for you...
No I don't shoot down any comments. Blogger sometimes does,
That's VERY INTERESTING if Blogger did indeed shot down my last reply to veri in the "The Rise of the Mouse" thread (which would have been comment 20).
It had only one link, I was trying to scare off veri by presenting myself as a devil's employee and I cannot see any possibility to have that kind of censorship automated because the link was to an image!!!
And the comment did show up for a while, so someone took it off.
Big brother cares for you...
(once again I had to remove the link above!!!)
Obviously an angel.
"You have to love the tenacity of the drug companies. They will never give up devising ways to bamboozle the public and physicians."
Right, although in this case, they seem to be suggesting it as a method for early-phase trials which are most likely to end up bamboozling themselves.
Although they do, ominously, suggest that it might also be useful for late stage efficacy trials: "In principle, this approach could be also adapted to obtain better information about efficacy in late clinical development RCTs as well, in which the design and analyses are highly regulated. However, the implementation of this approach would require a more thorough assessment of the statistical implications by the clinical and regulatory scientific communities."
"My only question is why would a journal would publish this nonsense and thereby give it an air of scientific credibility?!"
I have no idea...
One can sympathize with the goal of improving the efficiency of clinical trials by efforts to maximize the drug-placebo difference. However, the best way to do that has not been mentioned here – it is to select for the investigative team only those sites which have demonstrated track records of discriminating between active reference drugs and placebo. To be considered as a clinical site, each center needs to show that its procedures for recruiting and screening yield valid cases – and the acid test of validity for now is the drug-placebo difference with a reference antidepressant. Adoption of this practice would eliminate many of the weaker CROs whose data just muddy the water.
Hi, I just came across your blog and think that it's awesome that you take a critical look at the scientific world (because isn't that what science is, after all). I don't mean to dog science in general (being a young researcher myself), but it's important to have a few checks on the system, and a healthy dose of skepticism is at the heart of empiricism.
Since you're in the UK, I was wondering if you'd heard of an organization called "Sense About Science." They seem to have a similar mission as your own - check them out!
Anyways, thanks for the blog, it's really interesting.
Regarding comments by Mr Carroll who is apparently a past chairman of FDA drugs committee and of Dukes psychiatry dept.
What guarantee do we have that the centres with a track record of reporting a larger difference between placebo and a reference antidepressant, aren't those who engage in more slack or unethical practices on the ground, like breaking of double-blinds or corrupt links with drug companies?
I don't forget that the latter apparently extends to drug company reps seducing investigators and pumping them for information, as well as just the usual underhand financial dealings and god knows what else.
I understand both the FDA And Duke University have very strong links with pharmaceutical companies.
And if the acid test of whether a centre is getting the right sort of depressed people enrolled in their trial, is whether their participants in past studies showed the biggest difference between placebo and reference antidepressant, that sounds like it's already pre-biasing the results. Retrospectively judging study quality based on whether it got the results you think it should have got. And thereby excluding people who are just as in need of help but who 'muddy the waters' by not responding to antidepressants, even though the study results will be sold on the basis of proven efficacy for "depression" as if it's the same for everyone.
And those excluded people will no doubt end up being prescribed the same medications in the same shoddy way as everyone else without proper support or monitoring of the risks, as has been shown that GP's are failing to do here despite explicit warnings about those risks and the need for monitoring.
On what basis is the reference antidepressant chosen, and what makes the findings on that any more reliable as a baseline.
A scientific goal would be to maximise the ability of a study to accurately report either a larger or smaller difference from placebo, not to "sympathise with...efforts to maximize the drug-placebo difference"
Anonymous:
I believe the correct title is DR Carroll and not MR. He has earned it, even if I do not always agree with him.
It wasn't intended as a slight I just had more important things on my mind and it doesn't say on his profile.
Anonymous today: thank you for your comments.
I think your major points bear on what I have called elsewhere our present epistemologic quagmire for establishing the efficacy of antidepressant drugs (see link below). This state of paralysis and uncertainty stems from the dumbing down of diagnostic criteria for depression that began with DSM-III and from the increasing enrolment of nonclinical subjects by academic and commercial CROs in antidepressant trials. They may meet nominal criteria for the diagnosis of depression but they are not cases in the way that clinically referred subjects are. The result has been a steady rise in the response rate to placebo. This trend challenges the validity of the diagnoses for subjects enrolled in those trials.
I agree with you that the non-valid subjects are deserving of help – they just don’t belong in an antidepressant drug trial.
As for the acid test of diagnostic validity, differential response to drug versus placebo is all we have in the absence of biomarkers.
http://hcrenewal.blogspot.com/2009/04/in-defense-of-psychiatric-diagnoses-and.html
Thank you for your response Dr Carroll.
Your very interesting blog article that you link to, cites an article by Joanna Moncrieffe. As I recall and have heard her speak on, Dr Moncrieffe argues against the disease-based model of drug action that you say is the acid test of diagnostic validity in psychiatry, e.g. here. I believe NeuroSkeptic recently posted on a similar point re. shotgun psychiatry.
Regarding the alleged dumbing down of diagnostic criteria for depression and enrolment of too many people with 'nominal' rather than 'clinical' depression, I gather from some quick Googling that that opinion may be based on a wish to return to the abandoned distinction between melancholic/endogenous vs exogenous depression.
I don't see how any of this justifies post-hoc cherry-picking of results and study centres.
Btw re the original comments from others about having no idea why a journal would publish self-serving unscientific stuff from pharmaceutical companies. Really? I don't know about this particular journal but I'm guessing it has something to do with what the editor of the BMJ was pointing out at least as early as 2005, that medical journals have effectively become extensions of the marketing arm of pharmaceutical companies, and as a PLOS review found in 2006 "Advertisements and other financial arrangements with pharmaceutical companies compromise the objectivity of journals".
Anonymous left a comment in reply but it seems to have got caught in the ridiculous 'spam filter' again. It doesn't block any of the 10 actual spam messages I get daily...
"Thank you for your response Dr Carroll.
Your very interesting blog article that you link to, cites an article by Joanna Moncrieffe. As I recall and have heard her speak on, Dr Moncrieffe argues against the disease-based model of drug action that you say is the acid test of diagnostic validity in psychiatry, e.g. here (http://www.thepsychologist.org.uk/archive/archive_home.cfm?volumeID=20&editionID=147&ArticleID=1185). I believe NeuroSkeptic recently posted on a similar point re. shotgun psychiatry.
Regarding the alleged dumbing down of diagnostic criteria for depression and enrolment of too many people with 'nominal' rather than 'clinical' depression, I gather from some quick Googling that that opinion may be based on a wish to return to the abandoned distinction between melancholic/endogenous vs exogenous depression.
I don't see how any of this justifies post-hoc cherry-picking of results and study centres.
Btw re the original comments from others about having no idea why a journal would publish self-serving unscientific stuff from pharmaceutical companies. Really? I don't know about this particular journal but I'm guessing it has something to do with what the editor of the BMJ was pointing out at least as early as 2005 (http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020138), that medical journals have effectively become extensions of the marketing arm of pharmaceutical companies, and as a PLOS review found in 2006 "Advertisements and other financial arrangements with pharmaceutical companies compromise the objectivity of journals". (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1450016/)
WOW. You have it. You should be advising govts.
@ Anonymous again: I sense we are converging somewhat. But in the matter of selecting clinical sites it is a case of pre hoc cherry picking rather than post hoc cherry picking. It just is the case that there are elite clinicians and pedestrian clinicians, just as in all other activities. There also are elite clinical research programs and pedestrian clinical research programs. When your grandmother needs hip replacement surgery will you send her to a facility that has a 50% failure rate or to one with a 5% failure rate? Both claim to be following identical standard procedures. The best predictor of future performance is past performance, not the nominal procedures the two sites claim to be following.
The decision to abandon the melancholic versus nonmelancholic distinction in the universe of depressed patients is indeed what prompted me to use the term dumbing down of diagnostic criteria. I think that was a serious misstep by the field.
I agree that there has lately been intense commercial desire to mass market antidepressant drugs, which has resulted in much inappropriate prescribing of these agents. The dumbing down of depression diagnoses opened the door for disease mongering in the service of mass marketing. We need to keep in mind, however, that the world of marketing is not the world of clinical proof of drug efficacy.
Neuro: YOU ROCK!!!!
In order to obtain FDA approval, double blind studies are required.
This type of bias should be eliminated by a scientific study design.A company would not get approval based on a study where negative results are eliminated.
In order to obtain FDA approval, double blind studies are required.
This type of bias should be eliminated by a scientific study design.A company would not get approval based on a study where negative results are eliminated.
Nothing new under the sun:
Why Most Published Research Findings Are False
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field.
http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124
@Bernard Carroll again
You appear to be talking about pre hoc selection of future study centres based on post hoc evaluation of past results, rather than on actual identified issues with the methods and procedures used.
But it is misleading to equate the failure of an intervention with the failure of a controlled scientific experiment testing an intervention. The latter can of course be a success by correctly identifying that a proposed treatment does not in fact work, or a failure by wrongly indicating that it does.
If the goal is to make sure that only the most severely depressed people are enroled in studies, on the assumption that they are the only ones that respond significantly better to antidepressants than to placebo, and if you suspect that some study centres aren't enacting the standard procedures well enough to achieve this, then surely the scientific thing to do is to put in more supervision and analysis of the procedures actually being used on the ground to enrol people.
@ Anonymous again: your last comment just brings us back to the epistemologic quagmire.
This is not just an issue of severity but of typology.
You proposed that …surely the scientific thing to do is to put in more supervision and analysis of the procedures actually being used… All I can say to that is good luck! We come back to the ineluctable fact that right now the only practical validator is the drug-placebo difference in response to a reference antidepressant drug. What should that drug be? Certainly not an SSRI agent.
@Bernard Carroll again
ahh i had a feeling this might somehow relate back to tricyclics or maoi's, along with the melancholic classification :)
Melancholia yes. Tricyclics yes. MAOIs probably not. Mirtazapine maybe. Bupropion maybe. Venflaxine maybe. Duloxetine probably not.
Ah! Yeah! Epistemology, Statistics...
Great info. You've hit on some of the reasons I don't allow drug reps in my practice. Click on my name to find out more...
Post a Comment