• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

The problem with p-values

Sean

Senior Member
Messages
7,378
https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant
The problem with p-values
Academic psychology and medical testing are both dogged by unreliability. The reason is clear: we got probability wrong

The aim of science is to establish facts, as accurately as possible. It is therefore crucially important to determine whether an observed phenomenon is real, or whether it’s the result of pure chance. If you declare that you’ve discovered something when in fact it’s just random, that’s called a false discovery or a false positive. And false positives are alarmingly common in some areas of medical science.

In 2005, the epidemiologist John Ioannidis at Stanford caused a storm when he wrote the paper ‘Why Most Published Research Findings Are False’, focusing on results in certain areas of biomedicine. He’s been vindicated by subsequent investigations. For example, a recent article found that repeating 100 different results in experimental psychology confirmed the original conclusions in only 38 per cent of cases. It’s probably at least as bad for brain-imaging studies and cognitive neuroscience. How can this happen?

[See rest of article at above link.]


Related thread:

http://forums.phoenixrising.me/index.php?threads/scientific-method-statistical-errors-nature-2014-on-problems-with-p-values-and-ways-forward.39613/

 
Last edited:

Keela Too

Sally Burch
Messages
900
Location
N.Ireland
I think I read this article earlier.... and having used stats a little bit in my previous work, here are some things that bug me about the p = 0.05 value.

Even "statistically significant" outcomes allow for a 5% chance that the data points aligned that way by chance. So p= 0.05 should never be regarded as a proof. And of course multiple attempts at attaining that significance will eventually produce the desired* outcome ;)

Another problem is the difference in the way the word "significant" is use in science and in general parlance: If I said I was "significantly better" - you would expect that the difference between my prior state and my current one was large. You would hear the word "significantly" as if I'd said "substantially".

So if I said the results of a trial showed a "statistically significant" improvement, you might therefore expect the effect to be large, yet the actual change might still be minuscule! Statistical calculations work to pick up even small overall directional changes. A "statistical significant" result shows that most of the trial outcomes got swayed in the same direction, but it does not indicate by how much.

In something like a step test, this outcome could easily be swayed by cheering the participants a bit louder on their later attempts, or telling them their therapy had certainly worked making the participants push themselves a tiny bit more ;)

It would be possible that even if most people in a study improved their steps by only 10 during a second test, the result could still be "statistically significant". No matter that the extra 10 steps would be totally insignificant to improving the quality of the lives of the individuals concerned!

I think there has been a play on the use of the word "significant" by those who want to imply that the benefits of certain therapies are substantial enough to be useful. ;)


Note:
*Scientists should not have a favoured (or desired) outcome. They should be seeking truth. Thus all trials should be reported to show how many statistically insignificant results were obtained before that amazing "statistically significant" one appeared!

(Edited to remove an outbreak of the word "so"... :p )
 
Last edited:
Messages
2,158
I seem to remember when I taught A level statistics we taught the students that a 5% significance level was OK as an indicator that it might be worth doing further research, but for something that 'matters' like medical research, a 1% level, or an even stricter level is essential to cut down on false positives.

I was therefore pretty shocked to see the way medical, particularly psychological studies seem to:

a) use 5% across the board as a sort of magic number.

b) make no distinction between statistical and clinical significance.

c) not even mention the probability of false positives, let alone attempt to calculate it.

d) p-hack their way through masses of p-values generated by computer analysis of questionnaires, singling out the few p values that happen to fall under the magic 5% value and claiming clinical significance for what are likely to be simply random variation.

e) use statistical tests designed to analyse physical data measured on linear scales with normal distributions inappropriately to analyse psychological data based on subjective non linear 'scales'.

f) interpret correlations as causation in the direction that suits their pet theory.

I now look at any claim made by psychiatrists and psychologists that their treatment is 'proven' as having a very high probability of being wrong. I no longer trust psychological research at all. Nor do I trust medical research with questionnaire based outcome measures.
 

alex3619

Senior Member
Messages
13,810
Location
Logan, Queensland, Australia
P values are about estimating chance. They do not, in any way, substitute for poor methodology. Bad research and fraudulent research can have good p values. Good research can have bad p values as well, especially if underfunded, small cohorts, looking for things with small effect size, or are limited studies like a pilot study. In pilot studies chance becomes much more important.
 

alex3619

Senior Member
Messages
13,810
Location
Logan, Queensland, Australia
Something I have been thinking about the last few days is p-hacking. If you p-hack your old publications, and choose to go fishing for those results again, it might increase the chance you will find them again if you use the same or similar methodology. So you might not have to p-hack the current study. This method might be valid sometimes, but you risk finding associations rather than causes, and so it is not good evidence for causality.
 

alex3619

Senior Member
Messages
13,810
Location
Logan, Queensland, Australia
High reliance on intuition (and heuristics) are hallmarks of expertise. They always come at a risk of insufficient rational analysis. The discipline of behavioural economics looks at this. So much economic behaviour is based on intuition that economic behaviour cannot be summed up as rational.
 
Last edited:

TiredSam

The wise nematode hibernates
Messages
2,677
Location
Germany
I no longer trust psychological research at all. Nor do I trust medical research with questionnaire based outcome measures.
It's reached the point where if I'm listening to the news about some new medical discovery, as soon as I hear the word "questionnaire" I stop listening. If there is any way to measure something objectively but a questionnaire has been used instead, I ask myself why and assume the conclusion of the study only has a 0.05 chance of not being a pile of crap. Before I had ME and the education that goes with it I probably wouldn't even have thought about such things, just thought "oh, they've discovered that now have they?"