
The Last Psychiatrist's take is that "The Decline Effect" just represents sloppy thinking, treating different things as if they were all instances of The One True Phenomenon. Someone does a study about something and finds an effect. Then someone else comes along and does a new study, of a related but different topic, and finds a different result. Both are right: there's a difference. Only if you, sloppily, decide that both studies were measuring the same thing does the "Decline Effect" appear.
This is perfectly true and I've touched on it before, but I think it's a bit optimistic. It assumes that the first study was true. Sometimes they are. But because of the way science is published at the moment, a lot of results that get published are flukes. Some even say that the majority are.
The problem is that there are so many ways to statistically analyze any given body of data that it's easy to test and retest it until you find a "positive result" - and then publish that, without saying (or only saying in the small print) that your original tests all came out negative. Combine this with selective publication of only the best data, and other scientific sins, and you can pull positive results out the hat of mere random noise.

Perhaps, just as the act of observation has been suggested to affect quantum measurements, scientific observation could subtly change some scientific effects. Although the laws of reality are usually understood to be immutable, some physicists, including Paul Davies, director of the BEYOND: Center for Fundamental Concepts in Science at Arizona State University in Tempe, have observed that this should be considered an assumption, not a foregone conclusion.Hmm. Maybe. But there is really no need to posit such magical mysteries when plain old statistical conjuring tricks seem like a perfectly good explanation. On my view a raw result repository would not explain the decline effect, but just make it disappear.
Schooler doesn't go into detail as to how this repository would be set up, but he does cite the fact that we already have a pretty good one for clinical trials of medicines conducted in the USA. Anyone running a clinical trial is required to register it in advance, saying what they're planning to do and crucially, to spell out which statistics they are going to run on the data when it arrives.
What's really silly is that most scientists already do this when applying for funding: most grant applications include detailed statistical protocols. The problem is that these are not made public so people can ignore them when it comes to publication. Back in 2008 I suggested that scientific journals should require all studies, not just clinical trials, to be publicly pre-registered if they're to be considered for publication. This would be eminently do-able if there was a will to make it happen.

10 comments:
There have been a number of calls in some disciplines (eg., sociology) to emulate other disciplines (eg., economics) in a policy of complete data transparency. Too bad medicine has not followed suit: http://www.ncbi.nlm.nih.gov/pubmed/21240810
Even if people are very trustworthy and only apply the statistics they promised they would, we would still have enough flukes to fill Nature and Science. It's something Bonferroni, or lowering your p-value can't correct for, either, only slow down.
For instance, suppose ALL of some large set of null hypotheses were actually true, we would still reject them enough to generate a steady stream of statistically significant results, even using proper stats.
(a wonderful but depressing post on this: http://cscs.umich.edu/~crshalizi/weblog/698.html )
Anon, economists run the biggest experiments in the world: capitalism, IMF to name two. When the stakes are as high as poverty and war resulting from mistreated data then it seems natural any data verified or not, published or not, are just as appealing and delicious. I’d say they have no problems with data repositories or retrieving from govt data repositories because there's plenty of incentive like monitoring and protection from a national security point of view. Same goes for clinical trials, food trials, husbandry, military uniforms, energy usage trials and so on.
If you think about it that arbitrary .05/0.025 figure is utilitarian, it forever relies upon the majority to be sane. But I don't think scientific truths are necessarily utilitarian in nature. As seen in the hard sciences like physics. So when truths are tied to a social context like a bell curve or whichever distribution, then I presume decline fx can be assumed from a sociological perspective. For example, that participant who donned big hair, disco shorts and roller skated to work in the 60s unlikely represents the stereotypical majority today. The experiment worked because their brains were wired to respond positively to fluoro spandex. But if the scientist today were a major replicate geek and recruited that selective cohort, first of all that would be bias and secondly if someone was sporting uber lame in this day and age, they don’t represent no random sample, they need help, shoot them I mean help them.
To get an accurate depiction of what's really going on, perhaps in the future genetic sequencing and computer algorithms will be refined in such a way so that scientists can plug in the characteristics they're after, and voila, no longer do statistical models and physical observations are relied upon solely to define evidence based scientific truths. For the meantime, data repositories sounds cool, but I don't think it will alleviate or reveal bongo banging phenomena on the possible decline fx outside the scope of social and equipment discrepancies, or make scientific practices more palatable to investors.
A data repository sounds nice, but who has time to sift through all that data? It may work for clinical trials which result in data in a more or less standard format, but for less standard procedures, there may not any efficient way to process the data in bulk. For example, the research I do involves processing signals in a high-noise environment. The analysis is all customized and takes a lot of time to do.
I think a lot of the problem stems from the incentives. Being the first to publish a phenomenon can make a career... if it turns out you were wrong it's not nearly as big a deal. Being quick and being interesting is more important than being accurate, especially when you urgently need funding (which is common in this funding climate). They don't take your grant away if another lab fails to confirm your findings.
Brian: that's an excellent point. An example of this is the recent 'power law scaling' of websites. When this was first published, improper statistics were used leading to very wrong conclusions regarding preferential attachment (in some cases). But no matter, the original papers still have ~10,000 citations.
When a funding committee, (that cannot be reasonably expected to check such things for every applicant), compares two applications, one from the guy with 10k citations, and one from the guy who took his time, who do you think is going to get funded?
I think this is one reason for the vehemence by the 'voodoo correlations in social neuroscience' folks: by rushing ahead with improper methods, a lot of people were able to get super sexy, super flashy results that made for great headlines (scientists discover sex center in brain!), but poor science (in some cases).
I'm not a big fan of Ionnidis's paper as its based on a priori Bayesian conjuring but Marcus Munafo has done some good stuff on this topic (e.g. here)
When you for the first time report novel finding one has to be extra cautious to draw conclusion. i was wondering potential role of reviewer, they to a certain extent can impose stringent rules during evaluation,especially statistics.
Another factor contributing to the frequency of the decline effect in the cognitive sciences is the not uncommon exclusive reliance on p-values for evaluating the significance of an effect. p-values simply inform one as to the likelihood of the presence of *some* effect, not the size of the effect. Results with conventionally significant p-values are necessarily going to overestimate effect sizes when studies are underpowered due to insufficient sample size or correction for a large number of comparisons. Instead of just reporting p-values, one should report effect magnitude confidence intervals. Since confidence intervals provide a direct sense of how big an effect is likely to be, they can alert one to the possibility of effect size overestimation. See here for more info:
http://spikesandwaves.wordpress.com/2011/02/24/the-truth-wears-off-sometimes-and-the-importance-of-confidence-intervals/
This seems strangely similar to the Gartner hype cycle. Or maybe that's just another effect that will vanish on replication.
Post a Comment