• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

[2005] Why Most Published Research Findings Are False John P. A. Ioannidis

Esther12

Senior Member
Messages
13,774
There's already been plenty of discussion of this paper, but I thought I'd pull out some bits for myself, and may as well make them public.

It's open access here, and the paper is not that long, so might be a more worthwhile read than my attempt at a summary (this post is probably 1/4 - 1/3 the length of the article): http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124

Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

He starts out with the problem of a lack of replication, and researchers to often putting their faith in a validity of a single positive study.

Goes on to lay out some ways of discussing the likelihood of research findings being true:

As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11].

Then moves on to the problem of bias:

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias (Table 2), one gets PPV = ([1 - β]R + uβR)/(R + α − βR + uuα + uβR), and PPV decreases with increasing u, unless 1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in Figure 1. Conversely, true research findings may occasionally be annulled because of reverse bias. For example, with large measurement errors relationships are lost in noise [12], or investigators use data inefficiently or fail to notice statistically significant relationships, or there may be conflicts of interest that tend to “bury” significant findings [13]. There is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields. However, it is probably fair to say that reverse bias is not as common. Moreover measurement errors and inefficient use of data are probably becoming less frequent problems, since measurement error has decreased with technological advances in the molecular era and investigators are becoming increasingly sophisticated about their data. Regardless, reverse bias may be modeled in the same way as bias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationship because of chance.

Testing by Independent Teams: If lots of different teams are looking at an issue, and they and journals are interested in 'positive' results, then there is an increased chance of positive results being generated by chance, and then coming to effect people's view of reality. This is particularly a problem considering a general disinterest in replication: "Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation."

Corollaries [my comments in italics]:

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. [This made me think of a lot of CFS research which seems to dredge up small associations, and then tie them to some psychosocial theory. I think that this could also be related to the 'many teams' problem - I wonder how many researchers looked at predictors/associations for CFS, found nothing, and then published nothing?] : "Modern epidemiology is increasingly obliged to target smaller effect sizes [16]. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims. For example, if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors."

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. [Similar to above.]

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u. For several research designs, e.g., randomized controlled trials [18–20] or meta-analyses [21,22], there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) [23]. Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) [24] may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails [25]. Simply abolishing selective publication would not make this problem go away. [Remind anyone of anything?]

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research [26], and typically they are inadequately and sparsely reported [26,27]. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable [28]. [An important point that CFS patients are not allowed to make without being condemned for anti-psychiatry militancy.]

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. [Similar to the 'independent teams problem, but with the extra competition - this sounded better to me, as teams were more interested in promoting work which contradicted the claims of other groups]. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations [29]. Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics [29].

These corollaries consider each factor separately, but these factors often influence each other. For example, investigators working in fields where true effect sizes are perceived to be small may be more likely to perform large studies than investigators working in fields where true effect sizes are perceived to be large. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation.

Most Research Findings Are False for Most Research Designs and for Most Fields

He uses his models to argue that most research findings are false, I may be missing something, but this seemed pretty speculative to me, and rather bold for a paper that's pointing out the undue confidence researchers often have in their work.

Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias


I thought this was an interesting point:

Let us suppose that in a research field there are no true findings at all to be discovered. History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding. In such a “null field,” one would ideally expect all observed effect sizes to vary by chance around the null in the absence of bias. The extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of the prevailing bias.

...

Of course, investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.” However, other lines of evidence, or advances in technology and experimentation, may lead eventually to the dismantling of a scientific field. Obtaining measures of the net bias in one field may also be useful for obtaining insight into what might be the range of bias operating in other fields where similar analytical methods, technologies, and conflicts may be operating.

How Can We Improve the Situation?


Appropriately targeted large scale studies:

Large-scale evidence should be targeted for research questions where the pre-study probability is already considerably high, so that a significant research finding will lead to a post-test probability that would be considered quite definitive. Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research. Moreover, one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null [32–34].

Competing teams, changing culture of science, registering and adhering to trial protocols as ways of combating bias:

Second, most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve. In some research designs, efforts may also be more successful with upfront registration of studies, e.g., randomized trials [35]. Registration would pose a challenge for hypothesis-generating research. Some kind of registration or networking of data collections or investigators within fields may be more feasible than registration of each and every hypothesis-generating experiment. Regardless, even if we do not see a great deal of progress with registration of studies in other fields, the principles of developing and adhering to a protocol could be more widely borrowed from randomized controlled trials.

Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate [10]. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test [36].
Conclusion:

Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections [37], usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on how many relationships are expected to be true among those probed across the relevant research fields and research designs. The wider field may yield some guidance for estimating this probability for the isolated research project. Experiences from biases detected in other neighboring fields would also be useful to draw upon. Even though these assumptions would be considerably subjective, they would still be very useful in interpreting research claims and putting them in context.
A few of the references there look potentially interesting.

18-20: Attempts to improve reporting/reduce bias in trials.

I'm not sure if this one, or a similar one, has been discussed here before:

25. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG (2004) Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA 291: 2457–2465. doi: 10.1001/jama.298.1.61. CrossRef PubMed/NCBI Google Scholar

26, 27 on COI. There were no specific references for the ideological COIs, but again, I think I remember papers on this already being discussed here.

35 De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, et al. (2004) Clinical trial registration: A statement from the International Committee of Medical Journal Editors. N Engl J Med 351: 1250–1251. doi: 10.7326/0003-4819-146-8-200704170-00011.

Over and out.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. [This made me think of a lot of CFS research which seems to dredge up small associations, and then tie them to some psychosocial theory. I think that this could also be related to the 'many teams' problem - I wonder how many researchers looked at predictors/associations for CFS, found nothing, and then published nothing?] :

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. [Similar to above.]

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. ...True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) [23].

Conclusion:
Despite a large statistical literature for multiple testing corrections [37], usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding.
Thanks.

Just to recap some of the points from Ioannidis that you were highlighting as applicable to CFS research:
  1. lots of findings of small effects supporting BPS theories of CFS
  2. Questionnaires/scales are themselves 'fuzzy' measures when compared to objective outcomes such as death
  3. which gives us small effects on fuzzy measures for CFS
  4. data-dredging amongst countless questionnaire questions vs the desired outcome (fit with BPS view) could generate false postives. Even more so when the authors fail to correct for multiple corrections, as in this recent case highlighted by Dolphin
  5. so even some these modest findings may be false positives that appear only because of data mining and/or failure to correct for multiple comparisons
I am genuinely surprised that when years of searching has found no strong evidence to support the BPS view of CFS, the researchers haven't reavaluated their theories. Their models assume that whatever triggers CFS it is perpetuated by faulty patient beliefs and behaviours resulting in major disability and dramatic loss of quality of life. Surely such a situation would throw up many detectable strong effects - if it were true.
 

Esther12

Senior Member
Messages
13,774
Thanks for the summary of my summary Simon.

Their models assume that whatever triggers CFS it is perpetuated by faulty patient beliefs and behaviours resulting in major disability and dramatic loss of quality of life. Surely such a situation would throw up many detectable strong effects - if it were true.

They can just keep shrinking the claimed effect size, while continuing to claim that these are important factors, and that only militant anti-psychiatry patients would complain about having the psychosocial aspects of their lives medicalised anyway. From their point of view, I don't see any advantage to pulling out. Do you really think Chalder would benefit from admitting how ineffective her treatments, or flawed her theories, were?

If we had to pin this all down to one problem it would be:

The research community for ME and CFS is too small.

I don't know. If more money if spent on CFS research, without there also been a real cultural change, and commitment to releasing more data, trying to measure more objective outcomes, and outcomes that patients care about, less of a focus on finding positive results, etc, etc... then I think that the problems we've already seen could just be expanded and worsened.

There's a real lack of solid starting points, and that means that those researchers who will be the most 'successful' are those working in area most susceptible to the sort of problems that allow meaningless results to be registered as positive findings. More researchers would increase the chance of a genuine break-though, which could actually lead to real progress, but it would also be likely to lead to a massive increase in dross.

This bit interested me, not least because there's a constant thought in the back of my mind of 'this biopsychosocial stuff can't be pure quackery... surely someone would have noticed - what am I missing?':

History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
Thanks for the summary of my summary Simon.


Their models assume that whatever triggers CFS it is perpetuated by faulty patient beliefs and behaviours resulting in major disability and dramatic loss of quality of life. Surely such a situation would throw up many detectable strong effects - if it were true.
They can just keep shrinking the claimed effect size, while continuing to claim that these are important factors.
I was hoping you might summarise my summary of your summary: 120 character tweet?

I'm not sure about that. Their model of perpetuation makes strong claims, implying strong effects; if they want to argue that the effect sizes are genuinely modest (as opposed to merely having failed to find big effects) then they have to raadically modify their model. ie to the point that BPS factors alone (inc any secondary biological factors) cannot explain CFS. They have yet to do so. I think the idea that BPS factors could contribute to but not cause perpetuation would be a great deal less controversial - and would be more in line with the views about other chronic illnesses. The issue then would be illness management, not cure.

From their point of view, I don't see any advantage to pulling out. Do you really think Chalder would benefit from admitting how ineffective her treatments, or flawed her theories, were?
You may have a point.

I wonder if it is different for clinician reserchers. The optimist in my likes to think that most researchers would eventually get tired of heading down a blind alley(none of us would let go of a pet theory easily).
 

user9876

Senior Member
Messages
4,556
I thought this blog was quite a good analysis of how papers mislead.
http://jcoynester.wordpress.com/tag/breast-cancer/

Its talking about psychological treatments for breast cancer and claims made in papers that they prolong life. It turns out that the data didn't really support this. The results were cherry picked after the end of the trial and bad statistics as well where the mean value quoted overstates results due to a single point.
 

Valentijn

Senior Member
Messages
15,786
I am genuinely surprised that when years of searching has found no strong evidence to support the BPS view of CFS, the researchers haven't reavaluated their theories.
The problem is that they are not researchers who are looking for answers. They are CBT/GET practitioners designing trials and presenting data in the manner which is most persuasive in convincing others (who don't look closely) that CBT/GET is the only effective treatment.
 

Esther12

Senior Member
Messages
13,774
I
I'm not sure about that. Their model of perpetuation makes strong claims, implying strong effects; if they want to argue that the effect sizes are genuinely modest (as opposed to merely having failed to find big effects) then they have to raadically modify their model. ie to the point that BPS factors alone (inc any secondary biological factors) cannot explain CFS. They have yet to do so. I think the idea that BPS factors could contribute to but not cause perpetuation would be a great deal less controversial - and would be more in line with the views about other chronic illnesses. The issue then would be illness management, not cure.

Look at how they're redefining 'recovery' outside of CFS too. There's now lots of talk about the most important health problem being people's 'resilience' to health problems, and willingness to not see themselves as ill. (Most important to who, I wonder?)

The PACE recovery definition may be particularly absurd, but it also fits in to a pattern of redefining the 'sickness role', and what good health means.

When words have no real meaning, people can get away with saying all sorts!