Choline on the Brain? A Guide to Choline in Chronic Fatigue Syndrome
Discuss the article on the Forums.

"Points of significance: Importance of being uncertain" (2-page educational piece on stats)

Discussion in 'Other Health News and Research' started by Dolphin, Oct 6, 2013.

  1. Dolphin

    Dolphin Senior Member

    This is going to be a monthly series.

    I can't remember where I saw the link that caused me to print this out.

    Anyway, it's an educational piece that explains sampling and the Central Limit Theorem. It requires little knowledge of mathematics.

    But there are lots and lots of educational pieces that make the same point so not important to read this one specifically.

    I found the first column relatively dense and not really important.

    I think I only really fully accepted the Central Limit Theorem when I played around with a tool online that automatically did distributions of the mean for all sorts of weird samples. It is a little counter-intuitive that when one samples from all sorts of distributions e.g. heavily skewed ones, that the distribution of the sample means tends towards being normally distributed (i.e. a nice bell shaped curve esp. when the samples removed are not small). The implications of this are used a lot in statistics.

    Esther12 and alex3619 like this.
  2. anciendaze

    anciendaze Senior Member

    Two points here: 1) the CLT is about sample means, not individual points in a sample; 2) it requires the sampling process be independent and identically-distributed for all sampled data. It is very easy to violate independence when all decisions are made by the same people; using dice for a small part of the process is not the same as preserving independence. The identically-distributed aspect says that mixing different populations may invalidate the CLT. With the current mess in ME/CFS diagnostic criteria it is practically impossible to meet it. There can be weaker conditions for the CLT, but the proofs get increasingly hard to follow.

    One bizarre consequence of different forms of the CLT is that a sufficiently large number of random variables with different distributions will also approach a Gaussian distribution, provided some conditions are met. At the extremes of identical distributions and many different distributions we can depend on the CLT, but mixing two very different distributions may well invalidate it. This is precisely what seems to have happened in the PACE study, where independence is also questionable. The numerical values reported for confidence are practically meaningless.

    Gaussian (normal) distributions work best when you are dealing with elementary particles, where there are fundamental reasons to assume they are truly identical, and the sampling process has to be independent because no one on Earth knows how to control individual particles. Having millions of particles also helps.

    Human beings are not nearly as desirable subjects for application of this theorem. At the other extreme, using Gaussian distributions to bet on the stock market is a recipe for disaster, even if advised by Nobel Prize winners.
  3. Simon


    Monmouth, UK
    Greeting, fellow stats geeks. Part 2 is out now, don't miss it!

    Points of Significance: Error bars : Nature Methods

    The article covers the different types of error bar and how to interpret them. The main types I've seen in CFS research are SEM 'Standard Error of Mean', and 95% confidence intervals and you interpret them very differently. As the fig from the article shows below, 95% confidence error bars can overlap (sample 1 vs sample 2) a lot and still indicate a p value of <0.05, while SEM error bars need to have a substantial gap between them to be significant.

    btw, 95% confidence intervals are based on SEM x 1.97 ie are roughly twice the size

    Last edited: Feb 12, 2015
    Esther12, Valentijn and Dolphin like this.
  4. Simon


    Monmouth, UK
    This series is now open access:
    Points of Significance : Statistics for Biologists

    I just wanted to add something that came up in the excellent free online stats course I'm doing:
    Statistics and R for the Life Sciences | edX

    In a lecture on how not to present data, Prof Rafael Irizarry showed this slide:

    Those bars on the left often crop up in biomed mecfs papers - see those tiny error bars at the top? Looks like these are really clear-cut differences - but the data on the right is more informative (overlapping confidence intervals shown by green vertical lines in right hand graph indcate differences are not significant):
    Last edited: Apr 10, 2015
    alex3619, MeSci, moosie and 4 others like this.

See more popular forum discussions.

Share This Page