Discussion in 'Latest ME/CFS Research' started by Dolphin, May 12, 2010.
Ah, I see what you mean about the underlying (per patient)recovery data. And good luck!
Ta Bob. I'd forgotten how many people 'improved' in the SMC group then. I need to go back and re-read the paper at some point...
re Marks points: It would be great to get the raw data, free from spin. It's taking ages for them to produce any more papers with meaningful data in... I wonder why?
We need not restrict ourselves to the raw data but we could as for information such as emails about how they came to decide on certain ways to process data. This may be very illuminating and if it concerns stuff published in their last lancet article. For example why not ask for all documentation relating to discussions of how to do the post hoc analysis including the use of a likert scale for the chandler fatigue scale and how they decided on their definition of normal.
I assume such issues would be minuited in meetings and also be contained in emails etc. Even seeing who was at meeting would be interesting. I feel that the stats are really bad so was the MRC's statistics person (Tony Johnson) present when they were discussed.
Yes, that's an excellent idea. It could reveal some very interesting answers.
It occurred to me recently that we could have asked via FOI requests how many death threats Simon Wessely had received at the various institutions he works at.
Just wanted to highlight that finding.
And an FOI on the raw data, and maybe other stuff, is definitely a good idea. They have had long enough to do their thing with it.
It is important to remember that although the proportion of participants in the CBT/GET groups who "improved" in fatigue or physical function was only about ~15% more than the ~60-65% proportions in the SMC group for these measures, the improvement was just a threshold only rather than an absolute score.
This threshold was either 2/33 points in fatigue and/or "8" ie 10/100 points (resolution of 5) in physical function. When looking at the data presented in the Lancet paper, there is no way the group average advantages in the CBT/GET groups can be explained by the ~15% extra proportion of improvers improving by only 2 points in fatigue and 8 or 10 points in physical function, the improving participants in the CBT/GET groups would have improved more on average than those in the SMC group.
Part of me suspects that there is a small proportion of participants who, for whatever reason, responded really well to CBT/GET and are pulling up the averages for the group. Also, I wouldn't be surprised if the recovery rate is about 10% for CBT/GET and 5% for SMC, but I just based that estimate on visual representations of the mean(SD) which is wildly unsafe. I think oceanblue did some tentative calculations earlier and arrived at about 9% vs 4% respectively?
As others have already pointed out, the misleading returning to "normal" measure did not require participants to actually make any improvements and overlapped with trial entry thresholds, that is why it was not unusual that the proportion advantage was similar between outcomes of improvement and normality (as was NNT).
Another thing to consider regarding "normal" is that this is also another threshold. Therefore, depending on the distribution of the scores, it would be possible for many participants in the SMC ground to only be slightly under the threshold and not much different in general to most participants in the CBT/GET group, but for twice as many in the CBT/GET groups to be just over the threshold and therefore classified as returning to "normal" range. In such a situation, CBT/GET isn't impressive but the passionate proponents still get their 15 minutes of fame at the press conference claiming that twice as many people returned to "normal". This was already scandalous anyway because it did not require improvement. However, when looking at the mean(SD) data, I guess some would have had to improve, they couldn't all have been hovering so close to the trial entry thresholds at baseline.
The same problem with thresholds could occur with recovery figures too, although improvement must have occurred for recovery so I will take those figures more seriously than the ridiculous outcome of "normal" range. To find out what really happened, we really need more data. So regarding the FOI issue. Data should be more available, especially when the research is government funded. Perhaps there could be a system similar to the patent system for drugs. The data is stored independently, the authors get 2 years access, then everyone gets access, worse case scenario is that others pay a small fee for access?
[Edit: note the important caveat to this post, added in a later post (http://forums.phoenixrising.me/inde...-pace-trial-protocol.3928/page-82#post-272726)]
Re ME criteria in PACE. It does seem odd that basically all Oxford criteria patients without psychiatric disorder then met "London ME criteria" version 2, however v2 is basically the same as v1 but removed of all context and anything which possibly intefers with the concept of CFS.
Unlike v1, v2 does not mention autonomic and immune symptoms, which are not required in v1 but "in the right symptomatic context they contribute to the validity of the diagnosis". v1 acknowledges these extra symptoms are not specific to ME, however, two symptoms (alcohol intolerance and hypersensitivity to drugs) are described as "highly specific". Also, some additional symptoms and even physical signs mentioned in v1 as well as other ME criteria may be exclusionary to the PACE Trial but I'm still looking into that and have made much progress, but it is quite involved and messy with lots of ambiguity so will still take time to draw conclusions but I am confident the issue is relevant.
So what we have is a ME cohort which would still probably meet v1 but have been "cleaned up" to fit Oxford/NICE/CDC criteria. There are parts of v1 which would be more difficult to operationalize but PACE seem to have chosen the easy way by just ignoring whatever was difficult. Keep in mind these are the same people who believe the Canadian criteria (CCC) would have been "impossible" to use in research while 2 large studies published in 2011 not only used it but praised it.
Furthermore, there has been some confusion over post-exertional fatigue and malaise between the two versions. v2 only requires post-exertional fatigue, not post-exertional malaise or exacerbation of other symptoms, and there is no mention of time delay nor a 24 hour requirement.
v1 states (asterisks added): "Fluctuation of symptoms,*usually* precipitated by either physical or mental exercise." v2 adds here that (asterisks added): "The usual precipitation by 'physical or mental exercise', should be recorded, but is *not necessary* to meet the criteria." However, later on in v1 it states (asterisks added): "it is *absolutely characteristic* that [symptoms] tend to be exacerbated by physical or mental exertion and the association should always be sought whilst taking the history".
So it is not 100% perfectly clear what v1 requires for post-exertional symptomatology but it definitely places more emphasis on it than v2, and quite possibly demands it but just didn't word it properly. Also, I think Goudsmit has commented repeatedly that v2 just isn't the same and some v1 patients would have been excluded from the PACE Trial. Personally, I think PACE could and should have used the CCC or equivalent as a parallel rather than just a subgroup of Oxford/NICE/CDC, but that is a possible ethical catch-22, should have we risked the health of ME/CFS patients in order to generalize the PACE findings to them?
The CCC would also have contradicted the PACE emphasis on fatigue as *the* main symptom, as it not only requires multiple main symptoms but fatigue doesn't have to be the dominant one. The 2009 revised version of the CCC, obviously not available to PACE at the right time, doesn't even require the fatigue criterion for a diagnosis if the other 5 criteria are met. PACE wouldn't have been very happy about that!
I have had a hard time keeping track of versions 1 & 2 of the London ME criteria, although have heard Ellen Goudsmit said they used version 2 not version 1.
Judging by what you wrote, I think this is version 1:
and this is version 2:
Could anyone help me with this please?...
According to wikipedia, and some other sources, a 'normal range' in a medical context, is intended to include 95% of values, and this analysis should be carried out on the data for a healthy population in order to determine the normal range for the healthy population. (A normal range analysis can also be carried on people with disease to find out the normal range for people with that disease, or for a whole population to find out the normal range of values for the whole population.)
To find 95% of values, it is necessary to use the analysis of plus/minus 2 SDs from the mean.
But, in the PACE Trial, they used plus/minus 1 SD from the mean to find the 'normal range', but on the whole adult population.
Does anyone have any insight into this?
i.e. Why they used 1 SD instead of 2 SD?
Was the difference related to them analysing the whole adult population rather than the healthy population?
I've struggled to find much info on usual practise for the normal range.
The first time I saw this method was in this paper:
As one can see, this included White PD as well as Knoop H & Bleijenberg G, authors of accompanying Lancet editorial on PACE Trial. It said:
(my bold & underlining)
Thanks for that Dolphin. I'll have a read through that tomorrow.
I still can't see any papers which discuss or explain their use of 1 SD.
It doesn't seem to be usual practice.
Every thing that I've read says that +/- 2 SD should be used, to include 95% of the population.
I've seen one or two other unrelated studies using +/- 1 SD, but I haven't found an explanation for that methodology yet.
The +/- 2 SD analysis should be done on the healthy population in order to determine the normal range for healthy population. If they had used that methodology, then we would actually be able to see who had moved into the normal range.
Has anyone else got any ideas about this?
Its funny how a lot connected with CBT/GET is not usual practice. They redefine stuff all the time. I think this is a spin tactic built into their methodology. You can't refute something that is a little different. They also cannot claim its the same as elsewhere, but somehow they often do.
It all seems largely dependent on how healthy is defined. Blood tests may be different than subjective questionnaires. PACE using a general population to derive "normal" range for middle-aged otherwise-healthy participants was highly dubious, but perhaps the mean minus 1SD rule is better used when there is a ceiling effect on the maximum score?
Also, imagine how absurd it would have been for PACE to use the mean minus 2SD rule on the Bowling et al general population data, the threshold for "normal" would then have been 35/100, making most participants "normal" at baseline! The irony of course is that even the 2SD variation of the rule on a healthy population gives a physical function score of more than 60/100 points (about 70), and if using healthy matched controls this still might be more like 80 or 85 (in most CFS studies I've seen reporting the physical function score of matched controls, the mean was in the 90's and the SD was rather small).
I suspect the reason Knoop used +/- 1SD is that he was looking at recovery, not norms. It would be hard to argue that the threshold for recovery included those between -1 & -2 SD (bottom 2.5%-16% of the population) since peope with CFS were unlikely to be there en masse before they got ill. More reasonable to set the recovery threshold a little higher, hence -1SD, though that's just speculation on my part.
As Biophile points out, SDs for a healthy population - which is more homogenous by definition - are smaller than for the full population. From what I've seen, the mean for SF36 PF is around 95for a working age healthy population with SD around 10, which would give >=75 as 'normal' and >=85 as recovered.Hope this helps.
Yes, I also thought that it might be related to the ceiling effect, but I've still not found any info about it being usual practise.
Yes, using SDs for skewed distributions is never going to get a helpful result, is it.
But I'd still like to know why they thought it was acceptable to use 1 SD in such a high profile study.
What peer reviewed research did they base this methodolgy on?
Surely they can't have just invented it? (although, like Alex, I wouldn't put it past them!)
Yes, they should have done the analysis on a healthy population, to find out what the healthy normal range is.
But even then, of course, it would be meaningless because of the skewed distribution.
I think an SF-36 PF score somewhere above 75 would be appropriate.
Still looking for some literature on usual practise using 1 SD, that I can use as a reference, so if anyone finds any, I'd be grateful if you could post it.
The normal range assumes a normal distribution over a single test. The chalder fatigue scale isn't a single test but includes four separate components including mental fatigue and physical fatigue (they are separate according to chalder original paper). Its a bit like talking about size as a combination of height and weight and then saying lets look at the mean size. It simply doesn't make any sense.
If they used 2 SD then it would give them a score in the normlal range of up to 23.4 which is around the mean level for all the groups after treatment which would just make them look stupid to even those skimming the paper. Interestingly if you take their mean values after treatment adding 2 SD would have you over the max value for their scale.
Not only should the use a healthy population but they should reject any outliers so as not to skew the stats. In the past I have rejected everything outside 2.54 * the median absolute deviation as an outlier. I seem to remember it being a fairly standard technuique.
Its not just that the distributions are skewed they are also clipped. That is there is a hard edge over which you cannot have values but given the numbers quoted for the mean and std there seems to be values upto this hard edge. It does have the effect of making the maths really hideous.
Yes, 36% of the population have an SF-36 PF maximum score of 100, according to the Health Survey of England 1996. And the median is a score of 95.
Health Survey for England (HSE) 1996 (ages 16+)
SF-36 Physical Function scores
Mean for all adults = 81
Median for all adults = 95
25th percentile = 75
50th percentile = 95 (median score)
75th percentile = 100
36% of adults have the highest/maximum score for Physical Functioning, of 100. (64th percentile = 100)
One observation: while using norm-based stats on non-normal population datashould be nonsense, as it happens the mean -1SD formula used by PACE sets the threshold at around the 16th percentile for the Bowling population data they used, pretty well where it would be if the data had been normally distributed. I don't know if this is chance or not, but it is worth noting.
The difference between the mean and the median sugests that their are a significant number of low scorers in England but this would tie in with numbers claiming disability living alowance.
If you look at some of the age data its more interesting
Men Women all
age 75th 50th 25th 75th 50th 25th 75th 50th 25th
16-24 100 100 95 100 100 90 100 100 90
25-34 100 100 95 100 100 90 100 100 90
35-44 100 100 90 100 95 85 100 95 90
45-54 100 95 85 100 90 75 100 95 80
55-64 95 90 65 95 80 56 95 85 60
65-75 90 80 55 90 70 40 90 75 45
75+ 83 65 35 75 50 20 80 55 25
To me this shows the importance of age matching the general population sample with the set to see who has recovered. The pace trial quotes a mean age of around 39 which would put the 25th percentile at somewhere between 90 and 85 depending on the sexes. Remember this data includes healthy and sick and disabled people.
Separate names with a comma.