Julie Rehmeyer's 'Through the Shadowlands'
Writer Never Give Up talks about Julie Rehmeyer's new book "Through the Shadowlands: A Science Writer's Odyssey into an Illness Science Doesn't Understand" and shares an interview with Julie ...
Discuss the article on the Forums.

The problem with p-values

Discussion in 'Other Health News and Research' started by Sean, Jan 5, 2017.

  1. Sean

    Sean Senior Member

    Messages:
    3,257
    Likes:
    17,984
    https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant


    Related thread:

    http://forums.phoenixrising.me/index.php?threads/scientific-method-statistical-errors-nature-2014-on-problems-with-p-values-and-ways-forward.39613/

     
    Last edited: Jan 5, 2017
    actup, PhoenixDown, Hutan and 3 others like this.
  2. Keela Too

    Keela Too Sally Burch

    Messages:
    897
    Likes:
    3,983
    N.Ireland
    I think I read this article earlier.... and having used stats a little bit in my previous work, here are some things that bug me about the p = 0.05 value.

    Even "statistically significant" outcomes allow for a 5% chance that the data points aligned that way by chance. So p= 0.05 should never be regarded as a proof. And of course multiple attempts at attaining that significance will eventually produce the desired* outcome ;)

    Another problem is the difference in the way the word "significant" is use in science and in general parlance: If I said I was "significantly better" - you would expect that the difference between my prior state and my current one was large. You would hear the word "significantly" as if I'd said "substantially".

    So if I said the results of a trial showed a "statistically significant" improvement, you might therefore expect the effect to be large, yet the actual change might still be minuscule! Statistical calculations work to pick up even small overall directional changes. A "statistical significant" result shows that most of the trial outcomes got swayed in the same direction, but it does not indicate by how much.

    In something like a step test, this outcome could easily be swayed by cheering the participants a bit louder on their later attempts, or telling them their therapy had certainly worked making the participants push themselves a tiny bit more ;)

    It would be possible that even if most people in a study improved their steps by only 10 during a second test, the result could still be "statistically significant". No matter that the extra 10 steps would be totally insignificant to improving the quality of the lives of the individuals concerned!

    I think there has been a play on the use of the word "significant" by those who want to imply that the benefits of certain therapies are substantial enough to be useful. ;)


    Note:
    *Scientists should not have a favoured (or desired) outcome. They should be seeking truth. Thus all trials should be reported to show how many statistically insignificant results were obtained before that amazing "statistically significant" one appeared!

    (Edited to remove an outbreak of the word "so"... :p )
     
    Last edited: Jan 5, 2017
    actup, Sean, Mel9 and 7 others like this.
  3. trishrhymes

    trishrhymes Senior Member

    Messages:
    2,153
    Likes:
    17,870
    I seem to remember when I taught A level statistics we taught the students that a 5% significance level was OK as an indicator that it might be worth doing further research, but for something that 'matters' like medical research, a 1% level, or an even stricter level is essential to cut down on false positives.

    I was therefore pretty shocked to see the way medical, particularly psychological studies seem to:

    a) use 5% across the board as a sort of magic number.

    b) make no distinction between statistical and clinical significance.

    c) not even mention the probability of false positives, let alone attempt to calculate it.

    d) p-hack their way through masses of p-values generated by computer analysis of questionnaires, singling out the few p values that happen to fall under the magic 5% value and claiming clinical significance for what are likely to be simply random variation.

    e) use statistical tests designed to analyse physical data measured on linear scales with normal distributions inappropriately to analyse psychological data based on subjective non linear 'scales'.

    f) interpret correlations as causation in the direction that suits their pet theory.

    I now look at any claim made by psychiatrists and psychologists that their treatment is 'proven' as having a very high probability of being wrong. I no longer trust psychological research at all. Nor do I trust medical research with questionnaire based outcome measures.
     
    TiredSam, actup, Sean and 13 others like this.
  4. Keela Too

    Keela Too Sally Burch

    Messages:
    897
    Likes:
    3,983
    N.Ireland
    Sean and trishrhymes like this.
  5. Jenny TipsforME

    Jenny TipsforME Senior Member

    Messages:
    1,133
    Likes:
    3,822
    Bristol
    The word proven always makes me suspicious. Suggests the person claiming it doesn't understand these points about statistics and probability.
     
    Sean and Keela Too like this.
  6. Mel9

    Mel9 Senior Member

    Messages:
    415
    Likes:
    981
    NSW Australia

    In my work I like P to be less than 0.001
     
  7. Sean

    Sean Senior Member

    Messages:
    3,257
    Likes:
    17,984
    Yeah, where is the 6-sigma for psych research?
     
    trishrhymes likes this.
  8. alex3619

    alex3619 Senior Member

    Messages:
    12,480
    Likes:
    35,011
    Logan, Queensland, Australia
    P values are about estimating chance. They do not, in any way, substitute for poor methodology. Bad research and fraudulent research can have good p values. Good research can have bad p values as well, especially if underfunded, small cohorts, looking for things with small effect size, or are limited studies like a pilot study. In pilot studies chance becomes much more important.
     
  9. Snow Leopard

    Snow Leopard Hibernating

    Messages:
    4,613
    Likes:
    12,435
    South Australia
    The problem isn't P-values, it's poor statistical intuition of well, almost everyone (includes scientists, medical practitioners etc).

    P-values are fine if you don't use them as a shortcut vs actually using your brain.
     
  10. alex3619

    alex3619 Senior Member

    Messages:
    12,480
    Likes:
    35,011
    Logan, Queensland, Australia
    Something I have been thinking about the last few days is p-hacking. If you p-hack your old publications, and choose to go fishing for those results again, it might increase the chance you will find them again if you use the same or similar methodology. So you might not have to p-hack the current study. This method might be valid sometimes, but you risk finding associations rather than causes, and so it is not good evidence for causality.
     
    Sean, Keela Too and trishrhymes like this.
  11. alex3619

    alex3619 Senior Member

    Messages:
    12,480
    Likes:
    35,011
    Logan, Queensland, Australia
    Geigerenzer in multiple studies has shown that in his American medical doctor cohorts that about 80% do not understand basic statistics. They fail, in study after study.
     
    Sean, Keela Too and trishrhymes like this.
  12. alex3619

    alex3619 Senior Member

    Messages:
    12,480
    Likes:
    35,011
    Logan, Queensland, Australia
    High reliance on intuition (and heuristics) are hallmarks of expertise. They always come at a risk of insufficient rational analysis. The discipline of behavioural economics looks at this. So much economic behaviour is based on intuition that economic behaviour cannot be summed up as rational.
     
    Last edited: Jan 6, 2017
    Sean and trishrhymes like this.
  13. alex3619

    alex3619 Senior Member

    Messages:
    12,480
    Likes:
    35,011
    Logan, Queensland, Australia
    Results need to be rational, or you have to do more research to find the rational explanation.
     
    Sean, Keela Too and trishrhymes like this.
  14. TiredSam

    TiredSam The wise nematode hibernates

    Messages:
    2,677
    Likes:
    21,535
    Germany
    It's reached the point where if I'm listening to the news about some new medical discovery, as soon as I hear the word "questionnaire" I stop listening. If there is any way to measure something objectively but a questionnaire has been used instead, I ask myself why and assume the conclusion of the study only has a 0.05 chance of not being a pile of crap. Before I had ME and the education that goes with it I probably wouldn't even have thought about such things, just thought "oh, they've discovered that now have they?"
     
    natasa778, Sean, actup and 2 others like this.

See more popular forum discussions.

Share This Page