Saturday, 28 July 2012

Catching Fraud: Simonsohn Says

Everyone's been talking about psychologist Uri Simonsohn and his role in the downfall of two scientific fraudsters.


When the story first broke, the methods Simonsohn used that allowed him to spot the dodgy data were mysterious - which only added to the buzz. The paper revealing the approach is now up online and it's a must-read. It's not often a statistics paper offers the train-wrecky schadenfreude of watching two fraudsters' careers come to a well-deserved end.

What's rather disturbing about the article, however, is that it doesn't really contain much that's new, in principle. Simonsohn used statistics to spot data in published papers that was, in effect, 'too good to be true'. He then followed up seemingly dodgy cases with some more stats, using simulations of what real data ought to look like, to verify that it was in fact made up. A simple idea in retrospect but one that's never been tried before. I don't think there's a single "Simonsohn method", rather, the paper uses multiple techniques, each one tailored to the particular data in question.

But it shouldn't have come to this. Someone else ought to have spotted that the data looked dodgy.

Take this table from one of Simonsohn's conquests, a soon-to-be-retracted paper by Lawrence J Sanna et al:

We now know that the data from Studies 2,3 and 4 were all made up. Each study compared 3 conditions, and what makes these data dodgy is that the standard deviations of the 3 sets of results for each study were almost identical. The chances of that happening are very low and it suggests that someone has (clumsily) made the data up.

I'm going to say that these data are obviously suspicious, at least to anyone who has worked with real data. Maybe you'll say that hindsight is 20/20, but Simonsohn didn't need hindsight and the stats he used were nothing remarkable. I'm not saying that to belittle his achievements, he deserves plenty of credit. But other people deserve blame.

Namely, whoever peer reviewed this paper should have spotted that these data looked unusual - and they should not have needed any special statistical tools to do so.

Simonsohn calls for journals to require that the raw data be made available for all published work, on the grounds that. That's a great idea - and not just because it would help catch bad science: it would facilitate proper research and teaching no end. But Simonsohn didn't need the raw data to detect these cases of fraud - he only checked the raw results to confirm the suspicions based on the published data.

Checking that the data are valid is the job of peer reviewers, and they dropped the ball. Instead Simonsohn had to conduct his own private crusade against fraud... a bit like Batman. Batman is awesome, but the point about Batman is that he's only needed because the police can't or won't cope on their own. He's not a superhero, he's just a guy with the will.

Peer reviewers are the police of science, but all too often, they're asleep on the job. Not just in psychology. Retraction Watch provides plenty of examples of published results in biology that were faked, often in comically crude fashion, and should have been obvious to anyone paying attention.

Peer reviewers are usually anonymous. I wonder if a policy of retrospectively naming and shaming the reviewers when a paper turns out to have been fraudulent, might help motivate them...?

14 comments:

deevybee said...

I disagree! I don't think unusually similar SDs are necessarily easy to spot even when you are alerted to the possibility. I wrote a little routine in R to apply Simonsohn's method (http://www.slideshare.net/deevybishop/simonsohn-rprogram-13717559) and applied it to a couple of papers i had suspicions about. Both emerged as OK on the SD front - though interestingly, being forced to look at the methods in detail revealed other problems for both.

Marcus Munafo said...

But Simonsohn didn't pick the articles at random - they were selected for further interrogation because something about them didn't look right. The problem is that "something doesn't look right" isn't an acceptable peer review comment. Perhaps it should be.

Anonymous said...

The problem in most neuroscience is not fakery but wishful thinking. And that wishful thinking goes at all levels from grad students to granting agencies.

Trafton said...

I don't think it's a reasonable expectation for reviewers to have to check the validity of all sds in every paper they review. This should not be their job. Perhaps journals could have editorial staff that check this sort of thing for every paper instead?
This seems tobe conflating the job of determining whether something is relevant /interesting with the job of determining whether it is real. Perhaps high impact journals could try to handle the latter while reviewers handle the former? Wishful thinking I suppose

Neuroskeptic said...

Trafton: I think it is the peer reviewer's job. Ultimately, it's the journal's job to check the validity of the papers they publish. But they outsource this judgement to peer reviewers; so the responsibility to catch problems is the reviewers'.

problems includes everything from "It's fine, but a bit old hat" up to "It's fraud". Some journals will publish old hat, but no journal ought to be happy with fraud...

Anonymous said...

A far more common problem than outright fraud is inappropriate statistical analysis, such as using a t-test when comparing two sets of ordinal (not interval) data, failing to check for multicollinearity, etc. These get through journals all the time and produce incorrect conclusions.

I think reviewers should be responsible for catching errors like this, but realistically speaking, plenty of them can't. Perhaps journals should maintain a sublist of reviewers with some expertise in statistics and include at least one such reviewer per paper.

enbeh said...

I must admit that I usually do not check the stats for suspicious numbers when I review a paper. One reason is that I usually trust the authors, but another reason is that I truly think it is not my job. If the authors deliberately chose to commit fraud - so be it. I believe my role as a referee is to provide an expert opinion on the research as it is presented in the paper; my role is not to play science police.

Now all this is subject to debate. You may or may not have the opinion that it is the referee's job to check the validity of the statistical results. Or, as I saw in a recent post of yours, one may or may not find it is the referee's job to correct the authors' language mistakes.

What really annoys me all the time is that none of the journals I have ever reviewed for tell you what exactly they want your role to be. If they wanted you to check the stats (correct spelling mistakes, etc.), they could just say so. Instead, the "instructions" to the referees are either quite vague or non-existent.

Eric Charles said...

I think this all comes down to a time problem. It takes a surprising amount of time to prepare raw data for public posting. It takes a surprising amount of time to better evaluate methods and statistics. The time can be taken. My colleagues in Math routinely check every equation in papers they review, and this can take months. There is also much more credit given in Math for being a good reviewer than there is in psychology, which makes their time investment feasible. If we, as a field, cared enough about stopping these practices, we could improve the situation very quickly.

I don't understand why we are not more suspicious, by default, about to-good-to-be-true results. We should not knee-jerk assume fraud; methodological and statistical errors can be made honestly. Surely it is the job of the reviewer to catch those types of problems.

William said...

In some circumstances, there may also be a great deal of implicit pressure on a reviewer to not cause a crisis by flagging a problem.

A reviewer (who was being a jerk, in my opinion,) was extremely critical of a paper a friend of mine submitted (which was not research, so it really was a difference of opinion,) but by the time he made his objections known, it was so close to publication time, it was a major panic, and the problem was solved by consulting another reviewer.

Had it been research, and the problem was statistics, would it have been "solved" the same way? Rather than the threat of shaming reviewers as a deterrent to rubber stamping papers, what if the reviewing process happened earlier and pointing out problems was seen as helping rather than causing a crisis?

Neuroskeptic said...

enbeh: "What really annoys me all the time is that none of the journals I have ever reviewed for tell you what exactly they want your role to be."

I think that's very true. People (including editors) tend to assume that everyone shares the same idea of what peer review is.

I think I did that in this post also. I do see the reviewer's job as being science police and I assumed everyone did, but I see now that there are big differences of opinion on that score.

TheCellularScale said...

I think revealing the reviewer's names would be good policy for a lot of reasons. Not only the shame of letting a crap paper slip by, but also the credit for helping make a paper better.

another problem with the peer review model is that it is moderately thankless. And this would fix that and could be an alternate measure of one's influence in one's field.

I can see the problem with knowing who your reviewers are. If they are extremely harsh, it could inspire vengeful reviews. But I actually think knowing that you name will be public could inspire more unbiased honesty.

The Frontiers journals already do this, I think.

Zen Faulkes said...

These are the sorts of things reviewers are supposed to catch, but maybe reviewers are not given enough help to do so.

Once pointed out, the exact same standard deviation is a red flag. But a paper is a complex thing, and I can see how people can miss that.

Has anyone ever made a checklist of common errors and red flags? (Well, other than these.)

More people could be Batman if they had all his wonderful toys.

pj said...

One of the problems with dubious or suspicious stats in a paper is that it is almost impossible to ask the relevant questions.

No stats are prima facie fraud (I'm sure many of us have been in the position where a reviewer or co-author points out we've submitted the wrong numbers) so you then need to ask questions. In my experience the editors are uninterested and the authors are often obtuse and unhelpful. This leaves you in the difficult position of having to reject an article based on suspicions and half answered questions, accept it when you have some reservations, or, if you start getting bolshy, the authors withdraw it and submit elsewhere.

Peer review is a pretty blunt instrument - I've had papers rejected on the basis of gross misunderstandings by the reviewers that would embarrass an undergrad, and papers accepted without question with either glaring (accidental) mistakes or seat-of-the-pants methods that should have at least raised a question or two.

Em said...

I like the idea of making reviewers' names public.

Also, perhaps a paid stats person should be assigned as a member of each review team. Maybe this is something quantitatively-inclined grad students could take up (after passing some qualifying test) for extra cash?

The rest of us should not be absolved from the duty of looking closely at the numbers as well, but having a specialist on each team might be nonetheless be a good failsafe. After all, it may not be always be realistic to expect that the researcher who best knows all the relevant background work will also be the ideal person to spot statistical anomalies or judge whether the use of some new technique is appropriate.