1. Patients launch a $1.27 million crowdfunding campaign for ME/CFS gut microbiome study.
    Check out the website, Facebook and Twitter. Join in donate and spread the word!
Part 2: Brain Cells Making us Sick? Messed up microglia could be driving symptoms
Simon McGrath looks at theories that microglia, the brain's immune cells, might be overactive and driving the symptoms of ME/CFS and fibromyalgia.
Discuss the article on the Forums.

New attempt to avoid releasing data on 'recovery' from PACE

Discussion in 'Latest ME/CFS Research' started by Esther12, Nov 7, 2012.

  1. Esther12

    Esther12 Senior Member

    Messages:
    5,088
    Likes:
    4,856
    Yeah - that's what I'd guess too. Maybe they realised that having papers defining 'recovery' in a way that overlapped with their own definition for "severe and disabling fatigue" was too transparently absurd.
    ukxmrv likes this.
  2. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    Has anyone come across a definition of 'recovery' in any other research?
  3. alex3619

    alex3619 Senior Member

    Messages:
    6,639
    Likes:
    9,719
    Logan, Queensland, Australia
  4. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    http://www.forward-me.org.uk/Reports/White%20to%20Lancet%20re%20Hooper%20complaint%20(2).pdf

    PACE - Response to the complaint to The Lancet of March 2011

    Extract From White’s letter:

    "The threshold SF36 score given in the protocol for recovery (85) was an estimated mean (without a standard deviation) derived from several population studies. We are planning to publish a paper comparing proportions meeting various criteria for recovery or remission, so more results pertinent to this concern will be available in the future."

    I wonder why the mean score for the general population is considered a 'recovery'?
    I would have thought that the median score (95 for SF-36 PF) would be more appropriate.
    I wonder if median scores have been used in research to indicate a recovery?
    Or, is any particular percentile of SF-36 physical function scores, for the general population, commonly considered a 'recovery'?
  5. user9876

    user9876 Senior Member

    Messages:
    684
    Likes:
    1,548

    I have a problem with the SF-36-pf scale in that I don't think it is valid to use the mean and std deviation for it. I've been trying to formulate a couple of different arguments, they are complicated and not yet well formed. but I feel a discussion of the properties of the scales they are using as primary outcomes is important. The basic arguments I'm making also need to be better researched in terms of what is required from a scale and the validity of statistics.

    I'm thinking that since improvements were small then to justify the economics of the treatments they propose the measurment scales and definitions for improvement or remission need to be accurate. Hence casting doubt over very dodgy measurement techniques becomes important.

    1)

    To me it seems to be obvious that the SF36 measures for the general population will give a multimodal distribution where there is a distribution for healthy people along with distributions for different groups of sick and disabled people. The sum of these individual distributions then add up to give us a multimodal distribution. Statisticians will often refer to the central limit theorm and say that the sum of many of independant random variables is normally distributed. However, I would argue that these are not independant since h is a set of healthy people and each group of sick people is a particular deviation from that health group and hence there is a relationship. Thus we are left with a multimodal distribution and it makes no sense to talk of the mean and standard deviation.

    2)
    The scale has 10 questions each of which reflects an ability to do a physical task either easily, with some difficulty or with alot of difficult. The aim is to measure physical function as a single variable hence it is based on the theory that physical function is a single measurable thing.

    Lets define a person as having a physical function of x. where x is a member of X (the set of possible physical functions). Lets go a little bit further and define X as a continious set between 0 and 1 (dead and completely able). What we are really interested in is how different values of x map onto different scores from the sf36 pf scale. If this is linear (that is a change y in x1 leads to the same change in the scale as a change y in x2) then the scale is an interval scale and the mean and standard deviation can be used. Otherwise we have a situation where the scale function is simply monotonic and only the median or perceptiles can be used. We could have a situation where the scale is a non monotonic function in which case it is not a nominal scale and it only makes sense to talk about how many of each class exist. http://www.mpopa.ro/statistica_licenta/Stevens_Measurement.pdf

    Now if we break up the questions so that rather that having a question q1 we have Q1e, Q1s and Q1d which represent the three sets of people those who find q1 easy those with some difficulty and those with a lot of difficulty.

    We can define i as a member of Q1e when x > q1e and similarly for q1s and q1d giving us three thresholds defined on the underlying variable of physical function.

    If we do this for all questions we have a set of thirty thresholds which can then be placed onto our interval X (a line from 0 to 1). We do this assuming that there is a well defined ordering of how hard different activities are. We can define this in terms of a set of thresholds t1 to t30 where each maps to one of the question thresholds and ti > ti+1.

    A persons score will be determined by which of these intervals that they map into.

    The linearity of the scale will depend on having each of these thiry thresholds evenly distributed across X. If they are clumped or non evenly distributed the sf-36 scale is not an interval scale and the mean and standard devation should not be used. To my mind it is up to those using the scale to demonstrate that these intervals are even to justify their analysis using the mean and std.

    There could be some errors in that different people may consider the thresholds to have different orderings. There are two points here some is just error on where each person percieved the ordering differently. If the points are close and the ordering is debatable then this is not important to the argument since it still suggests that the scale is non linear. If people disagree strongly over the different orderings and would place big differences between the thresholds then I would argue that physical function isn't a single concept to measure and a different analysis would apply since the scale would then be over multiple variables.
    Dolphin likes this.
  6. Esther12

    Esther12 Senior Member

    Messages:
    5,088
    Likes:
    4,856
    ?

    I'm sure I posted a reply to you 9876. Damn it... that took ages! My abrupt version: I thought that maybe you weren't taking sufficient account of the innate difficulty of designing a questionnaire that would assess disability. I don't know if a measure could be constructed which would be an interval scale. There are additional problems specific to the way SF36-PF data was used in PACE, and the way the normal range was defined, and I see how the points you raised also affect some of the assumptions made about these sorts of measures of disability, but I'm not sure how important a point it is in relation to PACE.

    Also, I stumble upon this blog post, and thought it seemed relevant to PACE.

    http://neuroskeptic.blogspot.co.uk/2011/02/decline-and-fall-of-effects-in-science.html

    Dolphin and biophile like this.
  7. Snow Leopard

    Snow Leopard Senior Member

    Messages:
    2,203
    Likes:
    1,535
    Australia
    Given that the median and modal SF-36 scores are above the mean and the scale sharply cuts off at 100, it makes no sense to talk about normal being within 1 SD, because it is clearly not a normal distribution.
    user9876 and Bob like this.
  8. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    But if you're a PACE statistician, or a Lancet editor, then it makes perfect sense!!!
  9. Mark

    Mark Acting CEO

    Messages:
    4,460
    Likes:
    1,838
    Sofa, UK
    Has anybody tried contacting academic statisticians to put this point to them? Although I have a maths degree, stats was never really a significant part of what I did, but from the little I understand of it, this point about the distribution seems really quite clear. Surely there must be respected statisticians out there somewhere who can put their name to agreeing (with handy quote, perhaps) that it is questionable what they have done here? Or even have a conversation with us here about these issues, to put them in their proper context?
  10. Snow Leopard

    Snow Leopard Senior Member

    Messages:
    2,203
    Likes:
    1,535
    Australia
    About 68% of a population is within 1 SD of the mean, of a normally distributed set. So we could just look a the SF-36 data for a healthy population and see the cut off where 68 (+16% for the upper bound)% of the population lies.

    Because in the end, that is what they are trying to say.
  11. Snow Leopard

    Snow Leopard Senior Member

    Messages:
    2,203
    Likes:
    1,535
    Australia
    The raw data isn't provided, but the graph shows a sharp cutoff at 100 (and reports the ceiling effect), and the mean/SD for different age ranges:

    http://health.adelaide.edu.au/pros/...llbeing_south_australian_population_norms.pdf

    For the 35-44 age group, the 25th percentile PF score is 90 - and this is what I'd consider the cutoff for 'normal'.

    At 45-54, it drops to 80. (increased prevalence of illness - arthritis, type 2 diabetes etc)
    Dolphin likes this.
  12. biophile

    biophile Places I'd rather be.

    Messages:
    1,348
    Likes:
    3,977
    A modification:
    decline-pace.jpg
    Dolphin, Sean and Purple like this.
  13. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    That's what I originally thought, but then I thought about it further, and realised that +/-1SD cuts off the top and bottom 16% of values. In which case we would need to determine the 16th percentile for the normal population.
    Would you agree, or disagree, Snow Leopard?
    I've looked very hard to find the 16th percentile in the normative data, but I can't find it.
    In any case, it's not an appropriate analysis for many reasons, the skewed distribution being just one of them. (It's too late at night to illustrate them all at the mo.)

    It's a good idea, Mark. I don't know any prominent statisticians, but it might be worth trying to find one.
  14. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    ‘Percentiles’ for the HEALTH SURVEY FOR ENGLAND 1996

    http://www.archive.official-documents.co.uk/document/doh/survey96/tab5-18.htm

    Health Survey for England (HSE) 1996 (ages 16+)

    SF-36 Physical Function scores

    All Adults (ages 16+)

    Mean for all adults = 81
    Median for all adults = 95

    36% of adults have the highest/maximum score (100) for Physical Functioning (64th percentile = 100)
    http://www.archive.official-documents.co.uk/document/doh/survey96/ehch5.htm


    25th percentile = 75

    50th percentile (median score) = 95

    75th percentile = 100


    Of course, these aren't age-matched to the PACE Trial, so they aren't exactly appropriate.



    (I always advise people to check for mistakes before quoting me!)
  15. Snow Leopard

    Snow Leopard Senior Member

    Messages:
    2,203
    Likes:
    1,535
    Australia
    Yes, you are right. But the key is that the data for the age/sex matched regular population itself can't be used, because a variety of disabling medical conditions have been excluded from the CFS cohort, which were not excluded from the regular population data.

    I note from that link,


    Which is why I feel the 25th percentile for normal population is appropriate. Or 16th percentile for the population with limiting longstanding illness excluded.
  16. Simon

    Simon

    Messages:
    1,196
    Likes:
    3,557
    Monmouth, UK
    I checked at the time and the PACE used of 'mean - 2SD' does in fact give almost exactly the 16th percentile on the Bowling population data. Though just as you say, Bowling data used was for ALL adults (>30% over 65) and would include those with illnesses excluded from PACE. Using the 16th percentile on a healthy working age population gives a threshold of around 80, not the 60 used in PACE.
    ukxmrv likes this.
  17. user9876

    user9876 Senior Member

    Messages:
    684
    Likes:
    1,548

    I agree that it is hard to construct a measure that can be an interval scale. The point is that in many of the trials and in particular the PACE trial they treat scales as if they were interval scales when they are not. They then go on to define things like 'normal ranges' and clinically useful differences based on these assumptions.

    There seems to be two approaches that could be correct. Firstly to report more dimensions for each questionaire or secondly to use a utility approach such as they do in the EQ-5d survey.

    The point for PACE is that if you have scales are basically not up to supporting the analysis that is done then the results published are not trust wortthy and should be withdrawn.To me this is not a case of saying there are unpublished trials but one of saying trials that have been run don't have a suitable measurement framework. To use an analogy the way the PACE trial have used their measurements is a bit like me drawing some marks at random on a bit of paper, marking them as distances and then using them to measure the length of stuff.

    There are so many arguments as to why the PACE results cannot be trusted. As well as a mathematical argument around the nature of the scales there needs to be psychological arguments around how CBT frames peoples mind set into giving more positive answers to questions (such as in Grahams latest video).
    Dolphin likes this.
  18. Simon

    Simon

    Messages:
    1,196
    Likes:
    3,557
    Monmouth, UK
    Think you are making some very interesting points. Some comments and musings on yours:

    I think they used EUROqol for cost-benefit analysis, not sf36

    1. Agreed that mean and SD not appropriate when looking at the general population.
    But should be OK for comparing the means of two patient groups eg SMC vs CBT?

    2. Not an interval scale
    Yes, pretty tough to make an interval scale. I'm pretty sure there is no evidence SF36 is an interval/ratio scale and that's true of most questionnaires generally. Every now and then statisticians complain this invalidates some statistical interpretations, but this view never seems to get much traction :)

    I suspect it is an ordered scale rather than simply giving classes so had some utility. Also, even if some patients disagree to about some of the ordering the main thing measured is within-subject score (pre/post) and so patients presumably score themselves consistently there.

    Also, with the SF36 for modest change most item scores don't change at all ie remain 'not limited at all' or 'limited a lot' (which includes 'impossible'). Probably only 2 or 3 ex 10 questions change for most people and in each case they are moving from:
    - limited a lot > limited a little, or
    - limited a little > not limited
    not sure if this helps make things more consistent or not.
    user9876 likes this.
  19. Snow Leopard

    Snow Leopard Senior Member

    Messages:
    2,203
    Likes:
    1,535
    Australia
    I'd prefer objective measures of activity and neuropsychiatric functioning for measuring recovery/remission, but if they want to use the SF-36 PF score, anything less than 80 cannot be considered a reasonable indication of remission.
    Dolphin likes this.
  20. Bob

    Bob

    Messages:
    7,432
    Likes:
    8,572
    England, UK
    But if using a well-defined population, my understanding is that the common methodology for a 'normal range' is to use +/-2SD, which cuts of the top and bottom 2.5% of values. So it includes 95% of the population.

    Edit: This seems like a sensible definition of a 'normal range' to me, but I'm not sure if the general healthy population can be considered a good example of a 'well-defined' population.
    Simon likes this.

See more popular forum discussions.

Share This Page