False Positive Psychology

JaimeS · Jul 3, 2017

False-Positive Psychology:
Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant
by: Joseph P. Simmons, Leif D. Nelson, Uri Simonsohn

Note: this is a 2011 article, not a new one

http://journals.sagepub.com/doi/full/10.1177/0956797611417632

It's a fascinating one, and very applicable to what we've found...

Abstract:

In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Keywords methodology, motivated reasoning, publication, disclosure

____________________________________

I couldn't find this article posted anyplace; if there's an old thread on it, please let me know.

alex3619 · Jul 3, 2017

We could make a game of this, if I had energy anyway, about how many of these principles the PACE trial violated.

Snowdrop · Jul 3, 2017

alex3619 said:
We could make a game of this, if I energy anyway, about how many of these principles the PACE trial violated.

Kinda like guessing how many jelly beans are in that giant jar at the fair.

Sean · Jul 3, 2017

alex3619 said:
We could make a game of this, if I energy anyway, about how many of these principles the PACE trial violated.

Save a lot of time and energy if you just assume that they violated all of them, one way or another.

Alvin2 · Jul 3, 2017

This post reminds me of a calculator i saw somewhere and i can't remember where to post it here, you choose the president at the time, a few other variables and it "proves" an economic policy works by calculating statistical significance. I wish i could find it.
The point is science should be a search for the truth but alternative facters manipulate it to get the answers they want, through coincidences, massaging data, claiming correlation equals causation and outright fraud.
We need to devise better ways to weed this out, the current methods assume people are acting in good faith which is often not the case.

RogerBlack · Jul 4, 2017

I disagree with

despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05)

.
This is a common misunderstanding of what P values mean, and is a massive part of the problem.

A p-value of 0.05 does NOT mean that the result is likely to be in error one time in 20.

It depends on the real probability of the thing.

If you have a shiny paper, that is claiming that an intervention works, because the p-value means that the chance of random fluctuations giving the result in question is 5%, this does not actually answer what you care about - 'is the thing claimed a real effect'.

I perform a nice double blind trial, testing walking aftertaping different words to people for a week, where they can't see them, then in about 5% of the trials I'll get a positive result.

I perform a similar trial on peoples abilities to walk, and my hypothesis is that cutting peoples legs off worsens their ability to walk, and I find due to my weak test that my hypothesis is true and has a p value of .05.

You need to start with some expected probability in order to be able to go from the p=0.05 'there is a 5% chance that this trials result could have occurred by chance' to the thing we actually care about 'there is an x% chance this result is real'.

If we start off with a very low probability something is real, the chance of a positive result being false is essentially certain.
If we start off with a high probability, then the chance of a false positive approaches 5%.

https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant

Alvin2 · Jul 4, 2017

Found it
https://projects.fivethirtyeight.com/p-hacking/

alex3619 · Jul 5, 2017

RogerBlack said:
A p-value of 0.05 does NOT mean that the result is likely to be in error one time in 20.

This is most obviously the case in fraudulent results. A probability estimate is invalid as the chance is close to zero. Similarly if the results are entirely due to failures in methodology that result in biases, then its statistically significant, but it can still be entirely due to bias. The probability that the hypothesis being tested might be right could still be very very low.

If I recall correctly the current argument based on probabilities and the hypothesis being wrong are more like a third to a half are wrong. P values are about results, but biased results can show good p values.

In psychiatry I suspect more like ninety percent is wrong.

alex3619 · Jul 5, 2017

RogerBlack said:
If we start off with a very low probability something is real, the chance of a positive result being false is essentially certain.

This is a very big issue.

RogerBlack · Jul 5, 2017

I would now if I was feeling more energetic go and look at effect sizes of placebos, and compare with that in self-reports.

Philipp · Jul 6, 2017

RogerBlack said:
If we start off with a very low probability something is real, the chance of a positive result being false is essentially certain.
If we start off with a high probability, then the chance of a false positive approaches 5%.

Thank you very much for your post!

I need some more time to wrap my head around the basic concept here and probably sleep on it, but my point is: I have never seen any kind of reference, acknowledgement or the like of what you just wrote in any study I have ever read. Maybe some implicit stuff where the author states something like 'the problem we are working on is believed to roughly have to do with this and that', but not much more.
Granted, I am a layperson who mostly just skims abstracts, but as such an abstract-nonscientist-layperson I would have considered the study that was posted in the OP of this thread to look pretty solid and, at the very least, well-intentioned. If the people who did this work do not fully understand this stuff, I imagine a lot of people who e.g. do a bit of medical research as part of their MD programs won't fully understand this either. This may be at least somewhat significant because quite a lot of the 'cheap labour' part in the labs of university clinic labs is done by those people. As far as I understand it, their thesis supervisors are usually taking care of a lot if these projects somehow end up being published as formal studies, but they will have other obligations, occasionally not care about some of the projects, maybe not understand statistics all that well themselves and not realize it etc. So there are quite a few avenues for 'unsolid' stuff to creep in.

So, in an attempt to structure my thoughts a bit:

- How meaningful p-values are is related to how meaningful the method/data said p-value is calculated for was in the first place

- This does not tell us anything at all about how meaningful the data that was collected is

- Actually figuring out how meaningful collected data is is really hard, even for professionals

- The a priori probability distribution of an event/effect/etc is what we want to find out in pretty much any scientific endeavour, but we usually cannot directly measure anything that gives us the underlying probability distribution because we do not know how reliable the results of anything are (if we instantly did know this after measuring something, I assume there would not have been a need to introduce p-values in the first place)

- We want to find answers to questions we do not already know the answer to

Now to the part I am sure I don't really grasp:

- You have to make at least some kind of an educated guess at the real probability of whatever you want to look at (for reasons I do not completely get right now and therefore cannot accurately put into words)
- It may be possible to be really unlucky when you start working on a problem based off false assumptions or unrealiable methodology of which you do not know it is unreliable, so everything you do while working on a certain topic gets skewed in the wrong direction by intial results that are far from the truth
- This may not be corrected if you are not working on a problem repeatedly, or maybe even if you do?

So let me make up a scenario:

'Cancer can be appropriately treated by talking to cancer-patients in a certain manner' is the hypothesis we want to test (assuming 'talking to someone in a certain manner' could be standardized, all cancers and cancer patients being more or less equal, 'appropriate treatment' is somehow a clearly defined and agreed upon thing - ignoring for the moment that there is something wrong with all of that).

There are 2 fictional groups doing the science on this hypothesis.

The first group starts off with the assumption that the hypothesis has a 0% chance of being true.

The second group starts off with the assumption that the hypothesis has a 100% chance of being true.

Neither group actually knows beforehand how good the available methods are for testing the hypothesis (because both start from scratch).

Neither group is allowed to talk to the other.

-> They can both be off, but they cannot possibly both be right.

-> How likely will the group whose assumption starts farther away from the true underlying probability a) get to the real answer and b) realize how 'good' their methods actually are?

Because if this turns out to be really hard I would intuitively assume that you can be really wrong a long time if the basic unproven 'axioms' of your field kind of suck and you do not know that they, in fact, do suck. You know, like the basic assumption that the thoughts your brain produces can be the origin of the dysfunction of the organ that is producing them, which to me seems inherently illogical – or at least something that needs to be somewhat close to conclusively proven before simply accepting it as fact and and basing an entire industry on it. I hope that was not too clandestine a sentence.

I hope all of this does not derail the thread too much, but I can't be the only one who struggles to understand this stuff. Am I even making sense? If someone wants to take the time to school me a bit on this stuff, I assure you my mind would likely be blown. I do not know if that is worth anyones time, though

JaimeS · Jul 6, 2017

Philipp said:
I would have considered the study that was posted in the OP of this thread to look pretty solid and, at the very least, well-intentioned.

It is. The study talks about how you can make anything look significant if you mess around with your data. It focuses specifically on how researcher bias can affect study results: that because you have the ability to define what is 'right' and 'wrong', p-values end up having little meaning is part of the point of the article. For more, Andrew Gelman (Columbia), who cited the OP article, also wrote this article on p-hacking.

JaimeS · Jul 6, 2017

Philipp said:
- You have to make at least some kind of an educated guess at the real probability of whatever you want to look at....

We run into trouble when people start focusing on how to prove they're right, rather than how to prove they're not. They create hypotheses that are unassailable, no matter what the data tells them.

It's this kind of thinking that exemplifies bad science: "no matter what evidence is found, I am correct."

_____________________________________________

This is why it is helpful in science -- and in all areas where belief or bias might come into play -- to ask yourself the following question:

What would convince me that my idea is incorrect?

If the answer is "no evidence could convince me I am incorrect", then you are not dealing with a scientific theory: you are dealing with a belief. Beliefs are an entirely different kettle of fish: a kettle of fish that ought not to be allowed to swim in peer-reviewed journals, unless that journal discusses philosophy or theology.

Researchers who set out to work on something that deeply connects to their belief system design studies that cannot actually disprove their hypotheses; or they p-hack the data until it appears to say what they want by deleting an outlier here, changing a statistical analysis algorithm there. Often they do not even think they are doing anything wrong. They are so SURE, you see, that the best representation of the data -- the most honest representation of the data -- is the one that confirms their belief. Or, as one researcher told me with a totally straight face:

Just keep feeding it through different statistical analyses until the computer spits out a positive result; then, publish.

No, not at Stanford, this was long ago.

____________________________________________

Then you figure out an experiment that will honestly disprove your theory Y if you get result X.

Then you take the time to carefully, meticulously do the experiment.

If you have the resources, get somebody else to run the same experiment a few times as well. Make sure that they get the same sorts of results you do; if possible, run side-by-side and ensure you're getting the same data for the exact same samples... If possible, blind them so that you can't tell who's who...

_________________________________________

It's also advisable to make a statistical plan of attack -- insomuch as it's possible -- before you finish gathering your data. i.e. "I expect I'll see two groups emerge here, what should I do? If I only see one type of result, what then?" I think it reduces the temptation to mess with your data.

This only helps if you're honest, of course.

__________________________________________

Hopefully this was helpful, @Philipp , and addresses some of your thinking above.

Alvin2 · Jul 6, 2017

JaimeS said:
Researchers who set out to work on something that deeply connects to their belief system design studies that cannot actually disprove their hypotheses; or they p-hack the data until it appears to say what they want by deleting an outlier here, changing a statistical analysis algorithm there. Often they do not even think they are doing anything wrong. They are so SURE, you see, that the best representation of the data -- the most honest representation of the data -- is the one that confirms their belief. Or, as one researcher told me with a totally straight face:

This is a problem with human nature, our reasoning processes work based on incomplete data, experience and gut feelings. This was the best available when information was passed from word of mouth and science was as complicated as what happened last time i did this dangerous thing, is it edible and have we made the gods of weather angry.
Today we have written words, and an attempt to parse out reality from observations but our tendency to stick to our beliefs and outdated reasoning mechanisms is strong as ever, and it manifests from confirmation bias to "fixing" results to conflicts of interest to outright fraud. This is a huge challenge we will have to deal with and the mechanisms that have been used so far are an attempt but not enough to overcome our inherent logic flaws.

Dolphin · Oct 28, 2017

RogerBlack said:
I disagree with .
This is a common misunderstanding of what P values mean, and is a massive part of the problem.

A p-value of 0.05 does NOT mean that the result is likely to be in error one time in 20.

It depends on the real probability of the thing.

If you have a shiny paper, that is claiming that an intervention works, because the p-value means that the chance of random fluctuations giving the result in question is 5%, this does not actually answer what you care about - 'is the thing claimed a real effect'.

I perform a nice double blind trial, testing walking aftertaping different words to people for a week, where they can't see them, then in about 5% of the trials I'll get a positive result.

I perform a similar trial on peoples abilities to walk, and my hypothesis is that cutting peoples legs off worsens their ability to walk, and I find due to my weak test that my hypothesis is true and has a p value of .05.

You need to start with some expected probability in order to be able to go from the p=0.05 'there is a 5% chance that this trials result could have occurred by chance' to the thing we actually care about 'there is an x% chance this result is real'.

If we start off with a very low probability something is real, the chance of a positive result being false is essentially certain.
If we start off with a high probability, then the chance of a false positive approaches 5%.

https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant

What's called Bayesian statistics uses this approach (i.e. prior probabilities)

False Positive Psychology

JaimeS

Senior Member

alex3619

Senior Member

Snowdrop

Rebel without a biscuit

Sean

Senior Member

Alvin2

The good news is patients don't die the bad news..

RogerBlack

Senior Member

Alvin2

The good news is patients don't die the bad news..

alex3619

Senior Member

alex3619

Senior Member

RogerBlack

Senior Member

Philipp

JaimeS

Senior Member

JaimeS

Senior Member

Alvin2

The good news is patients don't die the bad news..

Dolphin

Senior Member