Thursday, 21 May 2009

Genes, Brains and the Perils of Publication

Much of science, and especially neuroscience, consists of the search for "positive results". A positive result is simply a correlation or a causal relationship between one thing and another. It could be an association between a genetic variant and some personality trait. It could be a brain area which gets activated when you think about something.


It's only natural that "positive results" are especially interesting. But "negative" results are still results. If you find that one thing is not correlated with another, you've found a correlation. It just happens to have a value of zero.

For every gene which causes bipolar disorder, say, there will be a hundred which have nothing to do with it. So, if you find a gene that doesn't cause bipolar, that's a finding. It deserves to be treated just as seriously as finding that a gene does cause it. In particular, it deserves to be published.

Sadly, negative results tend not to get published. There are lots of reasons for this and much has been written about it, both on this blog and in the literature, most notably by John Ionnidis (see this and this, for starters). A paper just published in Science offers a perfect example of the problem: Neural Mechanisms of a Genome-Wide Supported Psychosis Variant.

The authors, a German group, report on a genetic variant, rs1344706, which was recently found to be associated with a slightly raised risk of psychotic illness in a genome-wide association study. (Genome-wide studies can and do throw up false positives so rs1344706 might have nothing to do with psychosis - but let's assume that it does.)

They decided to see whether the variant had an effect on the brains of people who have never suffered from psychosis. That's an extremely reasonable idea, because if a certain gene causes an illness, it could well also cause subtle effects in people who don't have the full-blown disease.

So, they took 115 healthy people and used fMRI to measure neural activity while they were doing some simple cognitive tasks, such as the n-back task, a fairly tricky memory test. People with schizophrenia and other psychotic disorders often have difficulties on this test. They also used a test which involves recognizing people's emotions from pictures of their faces.
They found that -
Regional brain activation was not significantly related to genotype...Rs1344706 genotype had no impact on performance.
In other words, the gene didn't do anything. The sample size was large - with 115 people, they had an excellent chance to detect any effect, if there was one, and they didn't. That's a perfectly good finding, a useful contribution to the scientific record. It was reasonable to think that rs1344706 might affect cognitive performance or brain activation in healthy people, and it didn't.
But that's not what the paper is about. These perfectly good negative findings were relegated to just a couple of sentences - I've just quoted almost every word they say about them - and the rest of the article concerns a positive result.The positive result is that the variant was associated with differences in functional connectivity. Functional connectivity is the correlation between activity in different parts of the brain; if one part of the brain tends to light up at the same time as another part they are said to be functionally connected.
In risk-allele carriers, connectivity both within DLPFC (same side) and to contralateral DLPFC was reduced. Conversely, the hippocampal formation was uncoupled from DLPFC in non–risk-allele homozygotes but showed dose-dependent increased connectivity in risk-allele carriers. Lastly, the risk allele predicted extensive increases of connectivity from amygdala including to hippocampus, orbitofrontal cortex, and medial prefrontal cortex.
And they conclude, optimistically:
...our findings establish dysconnectivity as a core neurogenetic mechanism, where reduced DLPFC connectivity could contribute to disturbed executive function and increased coupling with HF to deficient interactions between prefrontal and limbic structures ... Lastly, our findings validate the intermediate phenotype strategy in psychiatry by showing that mechanisms underlying genetic findings supported by genome-wide association are highly penetrant in brain, agree with the pathophysiology of overt disease, and mirror candidate gene effects. Confirming a century-old conjecture by combining genetics with imaging, we find that altered connectivity emerges as part of the core neurogenetic architecture of schizophrenia and possibly bipolar disorder, identifying novel potential therapeutic targets.
I have no wish to criticize these findings as such. But the way in which this paper is written is striking. The negative results are passed over as quickly as possible. This despite the fact that they are very clear and easy to interpret - the rs1344706 variant has no effect on cognitive task performance or neural activation. It is not a cognition gene, at least not in healthy volunteers.

By contrast, the genetic association with connectivity is modest (see the graphs above - there is a lot of overlap), and very difficult to interpret, since it is clearly not associated with any kind of actual differences in behaviour.

And yet this positive result got the experiment published in no less a journal than Science! The negative results alone would have struggled to get accepted anywhere, and would probably have ended up either unpublished, or published in some rubbish minor journal and never read. It's no wonder the authors decided to write their paper in the way they did. They were just doing the smart thing. And they are perfectly respectable scientists - Andreas Meyer-Lindenberg, the senior author, has done some excellent work in this and other fields.

The fault here is with a system which all but forces researchers to search for "positive results" at all costs.

[BPSDB]

ResearchBlogging.orgEsslinger, C., Walter, H., Kirsch, P., Erk, S., Schnell, K., Arnold, C., Haddad, L., Mier, D., Opitz von Boberfeld, C., Raab, K., Witt, S., Rietschel, M., Cichon, S., & Meyer-Lindenberg, A. (2009). Neural Mechanisms of a Genome-Wide Supported Psychosis Variant Science, 324 (5927), 605-605 DOI: 10.1126/science.1167768

22 comments:

pj said...

A particular worry I have with these kinds of studies (well, over and above my doubts about the use of the BOLD signal as a measure of neural activity that can compared across populations) is that they will have likely genotyped a number of potential risk alleles.

But, precisely because of the stupid way that scientific publication works (as you mention), you may never even see the negative associations - or get an idea of how likely a false positive association was.

Similarly, I worry about how many 'novel' analysis methods were used before this final, not particularly convincing, finding was settled on.

Their statement that they have validated the intermediate phenotype approach seems a little strong, both given that it has been used plenty of times before (cf. COMT and prefrontal cortex) and because their study is pretty poor evidence of an endophenotype as things stand (there's no patient data for a start).

I like the claim that they used a method robust to avoiding type I errors - since - if you read the ref they refer to - it found something rather interesting. They looked at 720 frequent SNPs not associated with schizophrenia at 5% alpha (Bonferroni corrected, and in their sample) and found that 492 of these showed no association with cognition (7 cognitive factors, looks like premutation testign gave significance) at uncorrected 5% alpha*. They then claim that of these latter SNPs the rate of association with brain activation patterns was less than 5% - thus showing that standard neuroimaging methods lead to a reasonable control of false positives when looking at gene associations.

What worries me is that over half the SNPs they looked at turn out to be associated with cognition!* This doesn't exatly fill one with confidence that the findings in the current paper are not spurious!


* strictly speaking it looks like these were either associated with congition at 5% and/or associated with schizophrenia at an uncorrected 5% - but they din't give the data for how many were associated with cognition but not schizophrenia or vice versa.

pj said...

Another interesting bit from that citation - "there was no significant difference between rates for the 720 SNP panel and the supplementary analysis of 492 SNPs without any cognitive or schizophrenia association" - that is, if they included the SNPs that did show a relationship with cognition and/or schizophrenia the rate of association with imaging signal did not change - surely that's a bit concerning since they were excluded precisely because they were supposed to be associated with cognitive function or disease - surely that looks like an even stronger argument that endophenotypes are just false positives?

Neuroskeptic said...

Of course, we have no idea how many alleles they tested for, or how many other analysis methods they used -

Just off the top of my head the other analysis they methods they could have used, assuming they had collected "normal" fMRI data, are :

* Brain structure differences (voxel-based morphometry)
* Dynamic Causal Modelling* Pattern classification approaches
* BOLD-behaviour correlations (assuming they gave the participants a simple personality questionairre or something)

But we really don't know. What made me write about this paper is that even the null results we do know about were effectively brushed under the carpet of the positive result.

pj said...

There's also 'region of interest' analyses e.g. looking at only prefrontal cortex

AnlamK said...

Neuroskeptic,

As a layman, I read your posts with interest. Nonetheless, I think the bias towards positive results is reasonable. I mean, after all, fruit intake, what kinds of shoes you wear and all sorts of mundane details don't correlate with, say, the risk of mental illness with in a person. And these are as uninformative as can be.

I guess what I am trying to say is that positive results (i.e. correlations) are informative (unless of course they are trivial...). Nonetheless, negative results are not informative unless in certain special cases. (If someone found for instance, oxygen intake wasn't correlated to lung size, now THAT would be informative.)

It seems as if negative results are informative if they only challenge some pre-established result. But lacking that, do we really care?(I'll check out your links, too.)

Neuroskeptic said...

Oh yes, a positive result is definately more exciting than a negative result in some ways.

Especially if the negative result is very predictable (no-one would expect that shoe type correlates with mental illness).

But the problem is that even important, non-predictable negative results don't tend to get published.

pj said...

They're important because we set our criterion for what is a positive result at, say, 5% alpha (i.e. we're srt of saying that it is significant if there's only a 5% chance that the result is compatible with no real effect).

The result of that is that if 20 people look at that, one of those will find what looks like a positive result, even when it isn't. Add onto that data dredging and multiple testing maybe 1 in every 5 people studying something is going to get a false positive.

If you're studying something like the prefrontal cortex, or using prefrontal tasks (like the n-back) then there's more than enough people studying it to produce several false positive studies. If those true negative studies by the other people are not published, or published in such a way that the results are difficult to find, or aren't noticed, then people working in the field will get a skewed view of what the field as a whole shows.

This leads to scientists spending years wasting time studying phantom phenomena that only look real in the literature, but in fact, aren't.

Anonymous said...

i think Goldacre sums up the importance of negative effects nicely in bad science. A drug is being tested for cancer; there are 9 trials that have taken place which find no effect and remain unpublished. Then 1 finds an effect and this is published, much to the delight of the pharma company.

The significance of negative results in psychology/neuroscience are slightly less obvious but when we're dealing with psychopathology and more broadly human nature, the negative results are still important; certainly I want to know.

As a checky aside; I would put a wager on that shoe size does correlate with mental illness. Given that age of onset for schizophrenia is about 20 I bet most of the people with it have got comparibably (to children) big feet. Equally I bet you would find a good negative correlation for ADHD, and a really tight positive correlation for eating disorders, which start coming in around 8 or 9 and reach their prevelance at about the age of 20. Absolutely pointless comment, but that about sums my contributions up generally.

G.

Grumpy, M.D. said...

Agree. Problem is that a lot of funding tends to focus on finding postive results.

bsci said...

I meant to comment on this a while back. I think you misunderstood the point of the negative finding. It isn't actually a negative finding. It's a key element to the article's positive finding.

The article is trying to show that there is unique information in the connectivity analysis. If region A and the connections between regions A & B differ based on the gene presence, it's difficult to say whether the differences between regions A & B are due to connectivity or just differences at A. If there are no differences in regions A or B, and only the connections between the two are difference, then it is possible to say that the gene is affecting the connections between the regions.

The same is partially true for performance issues. If a behavioral test had enough power to what the connectivity study showed, then the diagnostic/predictive power of the connectivity analysis would be less significant and this probably wouldn't be in Science.

That said, I do wish there was more willingness to publish actual negative findings in high quality journals.

pj said...

Of course a connectivity analysis doesn't tell us anything about connectivity - only activity correlations.

And the multiple testing argument still stands - if they'd found a significant effect of genotype on activity would they have trumpeted the connectivity findings?

They report activity correlations between different hemispheres in prefrontal cortex, and between amygdala and hippocampus, orbitofrontal cortex, and medial prefrontal cortex. I hate to think how many correlations they looked for.

Also, why this one gene? Why not any other genes associated with schizophrenia? These studies take a long time to do, and you are going to re-use the samples to check different genes (Meyer-Lindenberg was involved in the NIH schizophrenia study which very much uses this approach). But we aren't told how many risk alleles were tested until this effect was found.

Plus there was no effect of genotype on performance, and there has been no study in schizophrenia indicating that this proposed endophenotype even has anything to do with it.

You find that, when you've done it for a while, science turns out to be filled with smoke and mirrors.

pj said...

"It isn't actually a negative finding. It's a key element to the article's positive finding."I meant to say, in other words, while this sort of statement is often spun by study authors, nobody believes it.

bsci said...

PJ, You are correct that if they found regional effects without the connectivity analysis, then the connectivity analyses wouldn't have been as relevant, but I'm not sure what your point is. They definitely could have published regional findings if they found them
If they didn't find regional or connectivity effects it would have been a true negative findings (and, alas, probably not published), but they did find connectivity effects and that's what they are publishing.

I'm not sure what you are saying nobody believes. If both regional and connectivity effects were discovered then the regional effects would probably be more highlighted. The fact that there were no regional effects, but there were connectivity effects is what makes the connectivity results interesting here.

While I can also pick apart the study, I doubt they were looking for too many other correlations. The supplementary materials note that they seeded the correlations from the regions that are most active during the n-back task and they found connectivity differences in many of those regions. While there are many different ways to do connectivity analyses, they are using a fairly standard method that doesn't seem to include anything odd. (Note that some of the methods that Neuroskeptic lists like Dynamic Causal Modeling require hypothesized models of connections between a limited set of regions, which has many more openings to multiple testing problems than a method that compares a seed region to the whole brain)

This data set will probably be used for other gene tests, but I don't see what's wrong with that. If they tested 10 other genes, ignored those findings and published this one, I'd worry, but I doubt that is the case. If it was, they definitely should have noted it in the article.

I don't get why you are calling this smoke and mirrors. It's creating hypotheses and testing them. Sometimes positive results turn out false upon further analysis and somethings weak findings get more support. It's science and it's messy. This very well might turn out to be a spurious finding. If it is, then other tests of this gene won't find similar results. If it's real, then more research will support these finding with other analysis methods and data sets.

Neuroskeptic said...

An intelligent debate on my very own blog! I'm so proud.

I share pj's worries about studies like this. The problem is that you never know how many tests were performed, found no result, and weren't published. In this case, the possible tests include: other region-region connectivity analyses; other analysis methods; other genes; and any combination of the above.

Now we don't know whether any of these were performed. Maybe they weren't, but this is the problem - we just don't know.

Maybe they only did exactly what they report in the paper, and they designed the experiment with exactly that analysis in mind.

or maybe it was all post-hoc data dredging until they found something publishable.

I genuinely don't know which it was in this case, but I know the latter happens. And in this case you can see exactly how it could come about - they search for activity or behavioural effects, find none, then do connectivity, which happily, can be done on a bog standard fMRI data-set they already have, unlike say a diffusion tractographic analysis, which would probably be more informative.

pj said...

I am well aware that every time a new paper claiming an association between a novel allele and schizophrenia is published that all these groups with this functional data go back and look to see if they can find an effect of this new allele.

Now I'm glad they do this because it would be an absolute waste if they ignored this mass of existing data collected over decades (in some cases). What worries me is that we are only given a partial picture of this process because they will only publish if they find an effect.

I know that the file-drawer effect is well known and thus boring. But I think it helps us all to keep our sense of perspective to constantly reiterate that this is a real problem.

With reference to the brain regions they looked at in this study - sure they 'seeded' regions of interest, but they correlated with every other brain region to find an effect. It isn't clear from the supplementary material exactly what they did but it seems they either report on n-back related correlations with a range of unrelated areas (amygdala, orbitofrontal) or they are reporting correlations from the faces task that weren't significant. If this was an a priori hypothesis (and we know it wasn't because, bizarrely, they claim there were no regions implicated a priori in the faces task!) they would have just tried to correlate region A with region B - not the whole brain.

Basically, scientists are big fat liars - all of them - we mustn't forget that, while still keeping an active interest in what the totality of the literature shows, however, potentially, partial that may be.

bsci said...

Neuroskeptic,

I'm not overly concerned about other analyses looking for the same thing. Generally, if a group has good scientists, several analysis methods or slight variations of the existing analysis method are tried and if the results radically change depending on analysis method then they are examined more closely before publication. There's no reason to report every variation, but part of the trust of science is they made a reasonable effort to prove their finding wrong. (Obviously, sometimes trust is misplaced)

For the testing of other genes, this gets more complex. While negative findings are important, should labs keep a running tally of every gene they tested on each data set and include it in every publication? On some level, for these large datasets, it's clear this should be public knowledge, but the current 5-15 page manuscript system isn't a great mechanism for this. These analyses are still time consuming and not completely automated so I doubt they've run this data against even 10 other genes. Still it would be nice to have said if they recruited these subjects based on this gene or if this is a generally large data set that is going to be used for many studies.

As for dredging the data. I don't uniformly consider this a bad thing. If your taxpayer dollars are being spent collecting this data, do you want people to only do the superficial analyses or one or two things and then throw away the data? If there's someone relevant in the data, you want to find it. When I hear data dredging in the pejorative sense, it usually means taking a data set that turned out to have really nothing interesting and then keeping searching for an insignificant but publishable finding. Identifying connectivity difference is regions that one might expect given the link to the disease is not insignificant.

Also, for your comment on diffusion tractography their fMRI tasks were relatively short. I wouldn't be very surprised if they haven't also collected diffusion MRI data that will be used in future publications. (If you ask why not in this publication, you are underestimating how time consuming each of these analyses are.)

bsci said...

PJ,

As I wrote to Neuroskeptic, I agree that it would be nice if every negative gene / functional data connect was written somewhere, but this isn't something that fits in the manuscript format. Other ideas on how to do this and how to separate interesting negative findings from irrelevant negative findings are welcome.

I should also note that there are very few groups sitting on 100+ subject fMRI data sets with gene profiles. This current data set is extremely new and probably couldn't have been collected even 5 years ago. This is a good general resource that will help inform a lot of other researchers. Even the obviously "file-drawer" projects that come from this can help inform other researchers and point then towards interesting directions to do studies explicitly focused on specific genes. If you take each paper as an independent research finding, there's not truth to your criticisms, but, in reality, they are part of an ongoing conversation.

I think you are also confused about what is or is not "a priori." The fact that they chose to run these two tasks meant that they were going to look specifically at brain regions that are activated during these two tasks that have some relationship to changes with schizophrenia and bipolar disorder. The selected brain regions did show significant signal changes during the task (see the ROI selection method), but they didn't show a significant difference based on the gene.

Of course, if all scientists are liars, you can just ignore what I write. :)

pj said...

"I think you are also confused about what is or is not "a priori." The fact that they chose to run these two tasks meant that they were going to look specifically at brain regions that are activated during these two tasks that have some relationship to changes with schizophrenia and bipolar disorder."But, they then looked at any brain region they could find that correlated with their a priori region - a true a priori analysis would have picked two or three regions and looked at the effect of genotype on the correlations between those regions.

And my point was that, bizarrely they claimed to have no a priori regions of interest for the faces task, so compared all regions to all others. Now I, like I'm sure you, can think of a whole load of brain regions implicated in their faces task, so I'm somewhat sceptical about this claim.

Looks like the differences here are that you have a rather optimistic view of science, while I take the contrary view. Perhaps I'm just getting cynical in my old age - or perhaps it is just bitter experience of the 'real politick' of science.

Neuroskeptic said...

Yes, I think there's an element of the duck-rabbit illusion in cases like this. Some people see exciting scientific results. Others see clever data manipulation. I'm not sure there's any way of knowing which is right (the authors themselves might know).

Again, what bothers me, as I've said, is that there is no way of knowing. I don't think all scientists are big fat liars. But the problem is that anyone could be, and the current publication process would let them get away with it.

pj said...

I more meant that all scientists write their papers from the point of view of the results they find - not the hypotheses they were pursuing - so their papers already give a very partial view of the research they performed.

I'm as guilty of that as the next man. But it behooves us not to take everything in the literature at face value.

bsci said...

PJ,
Where do they claim they had no a priori ROIs for the face task? The supplementary material states, "ROIs were defined a priori as DLPFC and HF for the n-back task2,4 and amygdala for the face matching
task4,6." There might be other regions worth studying, but they cite two references, including one from their past work as reasons why they thought the amygdala was relevant.

Also, the purpose of this type of analysis is to see where a seed significantly correlates with the rest of the brain. Statistical thresholding is set to account for multiple comparisons issues so I see no reason to limit this to studying only pairs of regions (and having to keep a separate tally of pairs tested to correct for multiple comparisons)

The presenting the initial hypothesis and history of a study vs the significant findings is a real issue, but is more a general issue in how to communicate discoveries than lying. There's also no easy solution to this issue. I want to read what people found. If what they didn't find is relevant, I want to read it too, but if their hypothesis evolved as they started to analyze the data, I don't need to read a full chart of the thought process in every publication.

Neuroskeptic,
I think the issue of who is or is not lying has little to do with the publication process and a lot more to do with the person. Short of requiring independent replication of every study, we need to trust that people are telling the truth. Someone could include every analysis run and every negative result and they can still be lying. My own standard when reviewing articles is that it should be possible to replicate a study given the presented information. If they meet that standard, but there's methodological problems or the data doesn't match the description of the methods, then I don't accept that article.

pj said...

I was referring to the statement:

"Post-hoc correlation analyses (partial correlation, two sided, site as nuisance covariate)
between behavioral and connectivity measures were performed...For the FMT, because there were no predefined regions for connectivity, we performed
correlation analyses of performance measures with each voxel in the connectivity map."
But it is a bit unclear to be honest.