August 8th, 2016: Understanding and Remembrance Day for Severe Myalgic Encephalomyelitis
Jody Smith joins with other ME voices in honor of Understanding and Remembrance Day for Severe Myalgic Encephalomyelitis.
Discuss the article on the Forums.

Is science broken? The reproducibility crisis

Discussion in 'Other Health News and Research' started by Simon, Mar 25, 2015.

  1. Simon

    Simon

    Messages:
    1,921
    Likes:
    14,538
    Monmouth, UK
    Interesting blog about a meeting at UCL this week looking at scientific standards in psychology and neuroscience - with an appearance there by blogger Neuroskeptic

    Is science broken? The reproducibility crisis
    by Liz Bal at Biomed Central blog network

    highlights:
    Chris Chambers, Professor of Psychology and Neuroscience at Cardiff University, blamed this problem on the pressure to publish ‘good’ results. Too often, the quality of science is measured by the perceived level of interest, novelty and impact of the results. This leads to a number of problems in the research process – publication bias; significance chasing; ‘HARKing’ (hypothesizing after the results are known); a lack of data sharing and replication; and low statistical power.

    he argues for pre-registration of studies with journals, that are peer-reviewed and accepted on the basis of the methodology, and published regardless of results so long as protocol was followed. He oversees this process at the journal Cortex.

    Neuroskeptic, Neuroscience, Psychology and Psychiatry researcher and blogger, said he became disillusioned by poor practices as a PhD student - referring to a “tacit decision” among scientists to accept methods that they would not dream of teaching to undergraduates.

    On the other hand, Sam Schwarzkopf, Research Fellow in Experimental Psychology at UCL, argued science is not broken and is actually working better than ever before

    read the full blog
     
    Sidereal, Kati, Sean and 4 others like this.
  2. anciendaze

    anciendaze Senior Member

    Messages:
    1,806
    Likes:
    4,655
    This is not at all limited to psychology. I've had some interesting arguments about parametric statistics in medical research with people who should certainly know better. I am far from the first to question meaning of published p values. Even in cases where my eyeball quickly tells me they are likely dealing with something other than a normal distribution researchers persist in assuming they have one. When questioned some senior researchers have said "you have to work hard to get a good normal distribution." This raises a suspicion in my mind that they are looking at samples, then rerunning sampling until they get a distribution that appears normal. Senior researchers will deny this, but if you talk to incautious graduate students you will hear that it is taking place.

    These same senior researchers will invoke the Central Limit Theorem to explain how they got the normal distributions in samples when the population distribution is far from normal. There is apparently no connection in their minds between rerunning sampling until you get what you want, and the essential prerequisite for the CLT that samples be independent. This is a massive violation of that condition.

    The one positive thing I can say about this approach is that their understanding of what they are doing is so defective they can't be sure which way they are biasing outcomes.
     
  3. alex3619

    alex3619 Senior Member

    Messages:
    12,491
    Likes:
    35,108
    Logan, Queensland, Australia
    There is nothing new covered in the blog, but that is not the point. The point is these things need to be discussed, and that is what is happening.
     
    Simon and Esther12 like this.
  4. Sean

    Sean Senior Member

    Messages:
    3,257
    Likes:
    17,985
    If methodology is sound (and followed diligently), and results published promptly, all these problems would be largely resolved.

    IOW, get the methodology right, and stick to it.

    (I have no in principle problem with additional post-hoc analyses being done, that can probably sometimes add more useful info. But the original protocol must be followed for the primary paper, and published before any altered post-hoc analyses are run.)
     
    lycaena, Esther12 and Simon like this.
  5. alex3619

    alex3619 Senior Member

    Messages:
    12,491
    Likes:
    35,108
    Logan, Queensland, Australia
    Science is broken in many ways, but then it has always had issues. Today we have also have undue influence of private funding in science, and the rise of consensus views being considered science. One huge thing we need is a better educated public, but not just on science but also politics. We also need open publishing. As many as possible need to be able to read every paper. Transparency is one key to adequate criticism.

    I am very in favour of detailed research plans being published before the study is done, and the study being considered with those plans in mind by reviewers and editors.
     
    Esther12 likes this.
  6. Simon

    Simon

    Messages:
    1,921
    Likes:
    14,538
    Monmouth, UK
    I think that's exactly right, and there needs to be a clear separation between what was planned and what is exploratory.

    So long as the labelling is clear, that's all well and good. I also think p values and effect sizes come into it too: a chance finding that only scrapes significance or is only a small effect is by the by, but a study that throws up a sizeable and significant finding should discuss it - I would want to know, though also know what was planned and what was stumbled across.
     
    oceiv, Esther12, Sean and 1 other person like this.
  7. user9876

    user9876 Senior Member

    Messages:
    2,583
    Likes:
    18,182
    I don't really see what is wrong with a post hoc analysis particularly if all data is publicly available for others to examine. The problem is that others don't examine the validity of the analysis and there is a culture of cherry picking. I tend to think just having a predefined analysis plan isn't sufficient because people don't adequately review them so as experience is gained in an area a pre-defined analysis can be written that cherry picks. In fact that can be done unintentionally by looking for techniques that have worked in the past.

    Two things are really needed.
    1) Make the data available
    2) Give credit to people who spend their time picking apart other peoples data and methodology.
     
    lycaena, Esther12, Valentijn and 2 others like this.
  8. anciendaze

    anciendaze Senior Member

    Messages:
    1,806
    Likes:
    4,655
    Just to reiterate: common parametric statistics all depend on the assumption of normal distributions; if you are dealing with something else the meaning of the numbers you get is highly questionable. Normal (Gaussian) distributions originally arose in the context of instrumental errors, as in astronomy. If you are measuring positions of star images on a glass plate with a traveling microscope they work very well. In other contexts instrumental errors may scarcely be relevant, and the natural processes you are studying are likely to have other distributions. I keep mentioning Lévy distributions, but there are plenty of cases of power-law distributions in examples as different as earthquakes, stock market prices, reliability of machines and physiology.

    The assumption of a normal distribution is equivalent to saying only the first two moments of the distribution are significant. All measures of significance depend not only on estimated mean value, which is fairly reliable, but also on estimated standard deviation or variance, which is much less reliable. You can run computer experiments to see just how sensitive this is, even in the ideal world of a mathematical model, to such things as a small number of questionable outliers. Once you realize how vulnerable a study is to manipulation by including or excluding outliers, you should be very cautious about drawing inferences from it.

    If the underlying distribution is like a Lévy distribution, the analytical expression of the distribution will not even have a well-defined standard deviation. You can perform arithmetic on a sample set to get a number, but this will be completely dependent on the bounds on sampling and the number of samples -- factors which are entirely under the control of experimenters. The temptation to adjust these parameters to produce desired results should be obvious.

    Even in textbook cases where you really do have a normal distribution, like measured heights of army recruits, you can make nonsense of this assumption simply by mixing together the two different normal distributions for male and female heights. (Tip: you can't depend on "the law of large numbers" to save you here, because 2 is not a large number. This is one of the inside secrets I learned from advanced training in mathematics.) It is all too easy to find published examples of such blunders, and the PACE assumption that healthy and sick people are really the same is not an isolated absurdity.
     
    Valentijn likes this.
  9. Esther12

    Esther12 Senior Member

    Messages:
    8,449
    Likes:
    28,522
    I'm not sure I'm following the points you're making @anciendaze.

    eg I don't get this:

    Admittedly, I did have to look up what a Lévy distribution was in order to even try, but can you dumb this down even more?

    I haven't really thought about how routinely normal distribution is assumed in medical papers. I need to kick myself into gear with learning some more basic stats. I keep putting this off.

    re PACE combining sick and healthy people: I get what you're saying, but also, you could say that splitting sick and healthy people requires a somewhat arbitrary division be made.
     
    oceiv likes this.
  10. Valentijn

    Valentijn Senior Member

    Messages:
    14,281
    Likes:
    45,814
    My basic understanding is that there are certain calculations which can only be done using data points which have a "normal distribution" and look similar to a standard bell curve. If there is no normal distribution, then the concept of a standard deviation cannot be applied, and is meaningless babble.

    Scatterplot graphs are great (and should be mandatory whenever applicable!), because you can glance at them and get a good idea of the distribution. If it's got a single high point and decreases similarly on both sides, then it might be a normal distribution. If there's a clump on one side and trails off to the other side, or has multiple high points, or clumps scattered around, or just has data points all over the place, it isn't a normal distribution, and standard deviation and other calculations are inappropriate and likely to be misleading. And there's the added bonus of seeing any extreme outliers which have skewed other calculations (averages).

    The SF-36 PF scale, which was used in PACE, does not have a normal distribution. It's very heavily skewed toward the higher scores around 90-100, with the percentage of the population having lower scores trickling away to the left as the scores go toward 0. Hence using a standard deviation with the SF-36 PF was extremely inappropriate, and created the problem where "very sick" in the real world was 65 or lower, and "normal/recovered" in their world of mangled statistics was 60 or higher.
     
    Last edited: Mar 27, 2015
    Esther12 and barbc56 like this.
  11. Valentijn

    Valentijn Senior Member

    Messages:
    14,281
    Likes:
    45,814
    Coursera has a lot of great statistics courses. One which is basic and started a few days ago is https://www.coursera.org/course/biostats

    It's free to register with Coursera, pretty simple to use it, and you get a cute little certificate to download and/or print when you complete a course :smug:
     
    Esther12 and barbc56 like this.
  12. anciendaze

    anciendaze Senior Member

    Messages:
    1,806
    Likes:
    4,655
    Part of the problem with education in statistics comes from indoctrination with a particular worldview. I came up a different path, and was already using some pretty advanced statistical ideas (probability amplitudes) which could be compared with objective measurements before I had an elementary course. This made the implicit assumptions much more obvious to me, since I already knew counterexamples.

    The extent to which fundamental data on physiology depart from the conceptual model in which there are important mean values and meaningless random variation around these can be judged by such publications as these: fractal dynamics in physiology, power law RR-interval in heart, pulmonary power laws. These are really fundamental physiological variables which behave quite differently from expectations based on mean values, homeostasis and random variation. Work on gait and movement planning may even be directly applicable to people with MS, ME or Parkinson's disease.

    The simplified version of my argument is that you really need evidence you are dealing with normal distributions before you blithely assume anything you encounter will conform to such expectations.

    In the case of PACE, where the original population distribution is one-sided and has a long left tail, with mean, median and mode entirely different, the problem becomes more subtle because sampling is said to convert this into a set of normally-distributed groups in each arm of the study. The original assumption of "one standard deviation below the mean" was only used to produce an arbitrary threshold that sounded scientific. This statement of the problem had the implicit effect of trivializing the illness by using common assumptions of professional readers about the percentage of cases within one-standard deviation of the mean of a normal distribution. This was a psychological maneuver to mislead readers and exploit known preconceptions about "CFS" which did not actually change any numerical results, only the way they were interpreted. Because we are dealing with professional psychologists and statisticians we must assume this was deliberate.

    The sampling question takes us much deeper into fundamental questions about statistical practices than busy doctors are likely to go. I've already said there are good reasons to believe the important idea of sample independence was massively violated. What distribution did individual groups have? I don't know. I can only say that the quoted p values don't mean very much.

    This still limits us to the clear air of theoretical statistics, where it is easy to decide if playing the game according to pre-established rules will result in particular numbers. What sneaks up on people glancing at a paper is the extent to which these numbers are clinically meaningless. Yes, you had some effect on group means. No, it wouldn't mean much even if you were talking about improving performance of patients with heart failure. Returning patients to mean values of the healthy population for their ages would have meant adding about 300 meters to distance walked in six minutes, not less than 40 meters. We still don't know if the change in group means was caused by a few misdiagnosed individuals returning to healthy norms, or by insignificant changes in the bulk of the group. The fact that data which would decide this is being withheld invites speculation over motives.
     
    Valentijn likes this.

See more popular forum discussions.

Share This Page