First results from psychology’s largest reproducibility test

Dolphin · May 3, 2015

First results from psychology’s largest reproducibility test
Crowd-sourced effort raises nuanced questions about what counts as replication.

Monya Baker

30 April 2015

http://www.nature.com/news/first-results-from-psychology-s-largest-reproducibility-test-1.17433

A.B. · May 3, 2015

Even if a finding is replicated, it doesn't mean the interpretation is correct.

For example in the PACE trial, if a person answered "yes" when asked if they feared worsening of symptoms following exertion, they would be labeled as having exercise phobia because that is the interpretation by the authors.

There are many reasons why someone could answer this questions with a yes, and discarding all other possible explanations in favor of a single one is a leap of faith and not science.

Simon · May 5, 2015

I thought this looked pretty interesting, and hats off to psychologist Brian Nosek of the Center for Open Science for setting up this study. It's not as if psychology is the only field with a problem: there have been failed replication attempts in Cancer Biology and Drug Discovery, and there are precious few replications amongst the hundreds of biomedical mecfs findings.

On to the resuts of the replication attempts of 100 different studies published in 3 different pyschology journals in 2008:

39% of findings were replicated
Up to 63% if you move the goal posts a bit, though that's still hardly compelling:

Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria, such as statistical significance, that would count as a successful replication.

Broadly, the results support John Ioannidis's claim that "Most Published Research Findings Are False".

Anyway, these resullts are provisional: a paper is currently under review at the prestigious journal Science.

The 'near misses':

Rigid, all-or-nothing categories are not useful in such situations, says Greg Hajcak, a clinical psychologist at Stony Brook University in New York. Hajcak authored one study that could not be reproduced, but for which replicators said they found “extremely similar” results that did not reach statistical significance.

I think there might be something in this*. One possible interpretation is that the effect is real, but so small it is hard to detect, so some studies find the pattern without reaching significance. So it may be that the result first reported is 'real'; but not worth getting excited about because it's not a big deal - which again is useful information.

Above all, work like this (Reproducibility Project: Psychology) might help clean up the literature so that researchers can focus on findings that are both real and big enough to bother with.

*on the other hand, most researchers would assume the published result meant that if they tried to replicate the study they would get a significant result again - and that didn't happen.

Original article: First results from psychology’s largest reproducibility test : Nature News & Comment

Simon · Aug 27, 2015

This has now been published in Science, with a readable report about it in Nature:
Over half of psychology studies fail reproducibility test : Nature News & Comment

The first hard evidence to support John Ioannidis's claim that "Most Published Research Findings Are False".

Crucially, this isn't a one-off study, but a systematic replication of 100 studies drawn from 3 different journals

.. using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power [= big samples!] to detect the original effect sizes

Whereas 97% of the original studies found a significant effect, only 36% of replication studies found significant results.
The team also found that the average size of the effects found in the replicated studies was only half that reported in the original studies. "The mean effect size (r) of the replication effects (Mr = 0.197) was half the magnitude of the mean effect size of the original effects (Mr = 0.403)". An effect size of 0.2 or less is usually regarded as trivial.

John Ioannidis, an epidemiologist at Stanford University in California, says that the true replication-failure rate could exceed 80%, even higher than Nosek's study suggests. This is because the Reproducibility Project targeted work in highly respected journals, the original scientists worked closely with the replicators, and replicating teams generally opted for papers employing relatively easy methods — all things that should have made replication easier.

Commenting on the quality of this new replication study, Andrew Gelman, a statistician at Columbia University said. “This is empirical evidence, not a theoretical argument. The value of this project is that hopefully people will be less confident about their claims.”

Although this study led by Brian Nosek, a psychologist, looks at psychological research he believes that other scientific fields are likely to have much in common with psychology (as other fields are known to have problems with replication too, eg only 6 of 53 promising cancer drugs had results that replicated). Ioannidis would no doubt agree.

leela · Aug 27, 2015

LOVE the title of this article:
http://www.independent.co.uk/news/s...is-just-psychobabble-10474646.html?icn=puff-4

@alex3619

First results from psychology’s largest reproducibility test

Dolphin

Senior Member

A.B.

Senior Member

Simon

Senior Member

Simon

Senior Member

leela

Senior Member