PACE Trial and PACE Trial Protocol

oceanblue · Mar 10, 2011

Esther12 said:
There are loads of good points in here, but I'm a bit worried that some might get lost.

I clumsily started a wiki here...
Or maybe we should wait til we've got a better idea of the key points, and then try to collect them all?

I know, all this awesome analysis, what's going to happen to it, and more particularly what are we going to DO with it?

The wiki looks like a good idea to me.

My dream is for a comprehensive rebuttal piece to be published somewhere like the BMJ, written by someone with credibility in the ME/CFS field that would take on maybe FINE as well as PACE, and use them to tackle the biopsychosocial argument head on. The biggest and bestest research came up with... nothing.

I also think it would be helpful to have, somewhere, a user-friendly exposure of PACE - perhaps as a series of shortish pieces in a blog. I'm still mulling this over - comments welcome.

Dolphin · Mar 10, 2011

oceanblue said:
I agree those figures look like drop outs, but maybe they are people who skipped quite a few sessions along the way, but were there at the end?

Yes, that's what they were by definition:

Adequate treatment was ten or more sessions of therapy or three or more sessions of specialist medical care alone.

My point is they are like "de facto" drop-outs/people who couldn't cope with the GET or CBT programme. Turning up for the assessment at 24 weeks and/or 52 weeks doesn't mean one stuck to the programme. I think it's interesting that the number of these is highest for GET.

oceanblue · Mar 10, 2011

biophile said:
Unimpressive results when compared to actual healthy populations, flawed definitions of "normal" for both the primary measures, questionable processing of potential participants, strawmanned version of pacing, omitted measurements, massive goalpost shifting, lack of actigraphy at follow up which they know would probably show no improvement, unexpected improvement trends in the "control" group, possible reactivity bias (placebo response, observer-expectancy effect, Hawthorne effect), ceiling effect of the Chalder fatigue questionnaire, stated conflicts of interest in the paper itself, anything important I miss?

For anyone willing to look beneath the surface results, the PACE trial ironically shows that contrary to the uncritical hyperbole surrounding it, the rationale and effectiveness of CBT/GET is unconvincing even when the researchers have ample resources, a carefully selected cohort, a highly controlled environment using their best methods refined over 20 years, a chance to redefine expectations, and an opportunity to dress up the data to make it look better than it really is.

If the rationale for GET was correct, we should expect a large improvement or even a return to the average of the healthy population, perhaps going further because of the "ex-patients" being encouraged to continue exercising as long as severe exacerbations are avoided. 12 months should be enough time to reverse the "deconditioning" that allegedly perpetuates symptoms of CFS. However on average, there was no such improvement in the PACE trial, so I think it is safe to say that the role of deconditioning, if any, is generally minor and has been grossly exaggerated.

Hi Biophile

You operate on a slow-burn: a long wait then you arrive with this awesome post!

Graphs really are our friend here. It is more effective to illustrate a reality check for the hyperbole using visual representations than a bunch of figures, it just takes a hell of a lot more time!

Very true: the graphs are really helpful.

As for what the right 'normal' group should be for SF-36 comparisons, I think it would ideally be an age and sex matched group of healthy people ie excluding those reporting a long-term health problem. Without the raw data from population studies like Bolwing, we can't calculate this but it looks like the resulting threshold would be about 80, the same as you propose. This study by Knoop (& White!), using a healthy population for reference, comes out with the same threshold score of 80.

When Bleijenberg & Knoop said PACE had used a 'strict definition of recovery' it was because they incorrectly thought that PACE has used a healthy population to define 'within the norm'. Which is pretty unimpressive in an editorial.

Thanks for all the great work, Biophile

Angela Kennedy · Mar 10, 2011

Ad hoc nature of 'Specialist Medical Care'

On page 3 of the Lancet pdf, 'Specialist Medical Care' is described thus:

"SMC was provided by doctors with specialist experience in
chronic fatigue syndrome (webappendix p 1). All participants
were given a leafl et explaining the illness and the nature of
this treatment. The manual was consistent with good medical
practice, as presently recommended.(2) Treatment consisted of
an explanation of chronic fatigue syndrome, generic advice,
such as to avoid extremes of activity and rest, specific
advice on self-help, according to the particular approach
chosen by the participant (if receiving SMC alone), and
symptomatic pharmacotherapy (especially for insomnia,
pain, and mood)."

This indicates all participant groups were subject to ad hoc treatment, especially pharmaceutical: such as anti-depressants!

This is yet another confounder. It would be theoretically possible that, for example, GET and CBT participants had more people on anti-depressants, thus self-reporting more positively because they are on 'happy pills' (!) or even if on painkillers, or sleep medication! It would then give a falsely positive result to those treatments on the self-reports.

Ad hoc treatment in a big trial like this? Blimey.

Bob · Mar 10, 2011

Hello everyone,

I've been away from the forum for a while, working on PACE Trial projects, so I've got to read back over this thread at some point...

But in the mean time, I'd really appreciate getting some feedback on the following please? (Sorry if I'm repeating stuff that's been covered earlier.)

I think a direct translation of the bimodal scoring to Likert continuous scoring would be as follows:

0 0 1 1 (bimodal)
0 1 2 3 (Likert)

bimodal scores translated to Likert continuous scores:
0 0-1
1 2-3
2 4-6
3 6-9
4 8-12
5 10-15
6 12-18
7 14-21
8 16-24
9 18-27
10 20-30
11 22-33

Changes in the definition of 'Normal Range':

The original 'normal range' in the protocol would be directly translated from '3 or less' (bimodal) to '6 to 9 or less' (Likert continuous).
So to translate the original protocol to use Likert scores instead of bimodal scores, they could not go above '9' for the 'normal range' using Likert, and they could not safely (i.e. with certainty) go above '6' using Likert to define the 'normal range'. But they have completely changed the definition of 'normal range' from '6 to 9 or less' to '18 or less'.

And on a closely linked subject...
It seems to me that a score of 12 or above on the Chalder questionnaire, would indicate a trend to an overall worsening of health...
If, on average, the answers tend towards "worse than usual", which would be the case for an average score 12 or above (a score of 11 being neutral, and not indicating any change either way), then that would indicate an overall worsening of health.
So how can a score of '18' indicate a 'normal range', when it actually indicates an overall deterioration in health?

The original bimodal 'normal range' of '3 or less' did actually indicate an improving participant.
But the new likert 'normal range' of '18 or less' includes participants who are worsening in overall health.

Do you think these interpretations are solid?

anciendaze · Mar 10, 2011

Dolphin said:
Yes, that's what they were by definition:
I point is they are like "de facto" drop-outs/people who couldn't cope with the GET or CBT programme. Turning up for the assessment at 24 weeks and/or 52 weeks doesn't mean one stuck to the programme. I think it's interesting that the number of these is highest for GET.

The apparent correlation between 'adverse events' and improvement should be highlighted. I wonder what detailed data would show. Should we suggest comparison with a treatment which only causes 'adverse events'.?

There is precedent. Having candidate astronauts stick a foot in a bucket of ice water, and leave it there for some minutes, had no relation to performance, but it was a great way to weed out those who were not committed.

Esther12 · Mar 10, 2011

@ Bob.

Your conversions of bimodal to Likert look right.

On the 'less than normal' point - I think that the questionnaire is meant to make it clear that it's 'normal' prior to being ill that is referred to, not 'normal' of being ill. It is a strange way of talking to patients who may have been ill for some time. There scores can be improved just by making them think that they used to feel more tired, rather than improving current levels of fatigue.

Dolphin · Mar 10, 2011

I put the "Adequate Treatment" figures into a stats calculator http://www.graphpad.com/quickcalcs/contingency2.cfm. No significant differences.

Analyze a 2x2 contingency table
Not adequate treatment Adequate treatment Total
GET 24 136 160
APT 16 143 159
Total 40 279 319

Fisher's exact test
The two-tailed P value equals 0.2365
The association between rows (groups) and columns (outcomes)
is considered to be not statistically significant.

Learn how to interpret the P value.

The Fisher's test is called an "exact" test, so you'd think there is exactly one way to compute the P value. Not so. While everyone agrees on how to compute one one-sided (one-tailed) P value, there are actually three methods to compute "exact" two-sided (two-tailed) P value from Fisher's test. This calculator uses the method of summing small P values Read more. Prior to 5-April-2004 this QuickCalc used the "mid-P" calculation which resulted in a different two-tailed P value.

It might be interesting to see the data on patients diagnosed with stricter criteria.

Dolphin · Mar 10, 2011

biophile said:
The average age of the participants in the PACE trial was 38 years (SD = 11?), so it is also worthwhile looking at these age groups in the general population rather than the entire "working population".

Just to correct this typo/similar: they used "working age" not people actually in employment.
ETA: I see you make this point later on.

Because of the way CFS is defined, where people are generally excluded if they have other conditions (unlike with some other diseases), their normal just be close to a healthy normal.

Love the graphs - thanks.

Bob · Mar 10, 2011

Esther12 said:
@ Bob.

Your conversions of bimodal to Likert look right.

Thank you.

Esther12 said:
@ On the 'less than normal' point - I think that the questionnaire is meant to make it clear that it's 'normal' prior to being ill that is referred to, not 'normal' of being ill. It is a strange way of talking to patients who may have been ill for some time. There scores can be improved just by making them think that they used to feel more tired, rather than improving current levels of fatigue.

Ah, thank you Esther, that's cleared up a lot of confusion for me!

Dolphin · Mar 10, 2011

Note on the point of using 80 as the threshold: if one have a recovered group that are like normal, it means it is possible the could be derived from the same distribution.
If most of the scores were 80/85 and a few 90+, such a group (which might have a mean in the low 80s or even a bit higher) could still be different from a healthy population with mean of 92/93. So shouldn't be described as a normal group.

Dolphin · Mar 10, 2011

Angela Kennedy said:
On page 3 of the Lancet pdf, 'Specialist Medical Care' is described thus:

This indicates all participant groups were subject to ad hoc treatment, especially pharmaceutical: such as anti-depressants!

This is yet another confounder. It would be theoretically possible that, for example, GET and CBT participants had more people on anti-depressants, thus self-reporting more positively because they are on 'happy pills' (!) or even if on painkillers, or sleep medication! It would then give a falsely positive result to those treatments on the self-reports.

Ad hoc treatment in a big trial like this? Blimey.

We are given the figures for the numbers and percentages on antidepressants in each of the groups in Table 2.

Dolphin · Mar 10, 2011

Bob said:
But in the mean time, I'd really appreciate getting some feedback on the following please? (Sorry if I'm repeating stuff that's been covered earlier.)

I think a direct translation of the bimodal scoring to Likert continuous scoring would be as follows:

0 0 1 1 (bimodal)
0 1 2 3 (Likert)

bimodal scores translated to Likert continuous scores:
0 0-1
1 2-3
2 4-6
3 6-9
4 8-12
5 10-15
6 12-18
7 14-21
8 16-24
9 18-27
10 20-30
11 22-33

Afraid that's not correct. Somebody with a score of 6 could score 6x3 and 5x1 to score 23. So a range of 12-23.

Unlikely to be 12 in practice as that involves somebody with 5 symptoms less than usual and 6 symptoms much more than usual.

Bob said:
The original 'normal range' in the protocol would be directly translated from '3 or less' (bimodal) to '6 to 9 or less' (Likert continuous).

3 can go all the way up to 17 (8x1 plus 3x3).
However, that's not to say that 17 should be the threshold - one would have to look at the raw data (which we don't have) from the study which validated 3- as a threshold for "normal" fatigue. It is unlikely to be 17-.

It's all very odd/dodgy that they would suddenly stop using this threshold that Trudie Chalder was herself involved in validating (validation involved comparing scores to another scale).

Dolphin · Mar 10, 2011

Esther12 said:
On the 'less than normal' point - I think that the questionnaire is meant to make it clear that it's 'normal' prior to being ill that is referred to, not 'normal' of being ill. It is a strange way of talking to patients who may have been ill for some time. There scores can be improved just by making them think that they used to feel more tired, rather than improving current levels of fatigue.

Yes, well said.

Angela Kennedy · Mar 10, 2011

Dolphin said:
We are given the figures for the numbers and percentages on antidepressants in each of the groups in Table 2.

Thanks Dolphin. Got that now.

Esther12 · Mar 10, 2011

Dolphin said:
Afraid that's not correct. Somebody with a score of 6 could score 6x3 and 5x1 to score 23. So a range of 12-23.

Unlikely to be 12 in practice as that involves somebody with 5 symptoms less than usual and 6 symptoms much more than usual.

3 can go all the way up to 17 (8x1 plus 3x3).
However, that's not to say that 17 should be the threshold - one would have to look at the raw data (which we don't have) from the study which validated 3- as a threshold for "normal" fatigue. It is unlikely to be 17-.

It's all very odd/dodgy that they would suddenly stop using this threshold that Trudie Chalder was herself involved in validating (validation involved comparing scores to another scale).

Thanks for pointing that out. My brain's gone soft.

Bob · Mar 10, 2011

Dolphin said:
Afraid that's not correct. Somebody with a score of 6 could score 6x3 and 5x1 to score 23. So a range of 12-23.

Thanks for that Dolphin.

Angela Kennedy · Mar 10, 2011

"Depressive Disorder only" in the mix?

Ok - I'm working on the hop at the moment- so any help on this will be appreciated.

As people may understand, my specific interest has been who was in this trial (and who was left out) in terms of illness.

"figure 2" on page 8 of the Lancet article pdf has graphs etc.

They state figures for 'All participants'' 'International CFS only'; 'London ME only' and 'Depressive disorder only'.

Firstly, do we have actual figures for the three apparent sub groups of patients (and how many of each were allocated to treatments)?

Secondly, Where have the 'depressive disorder' group come from?

(i have not been able to find this data. But it might be me being dense - to be quite honest)...

Dolphin · Mar 10, 2011

Angela Kennedy said:
They state figures for 'All participants'' 'International CFS only'; 'London ME only' and 'Depressive disorder only'.

Firstly, do we have actual figures for the three apparent sub groups of patients (and how many of each were allocated to treatments)?

Figures are in Table 1.

BTW, I think I said to you somewhere that Figure 2 showed there wasn't a different response for subgroups (because of what the authors said). However, somebody elsewhere has happened to point that the red overlap in Figures 2F and 2G is interesting. It'd be good if somebody could get the means and standard deviations out of the authors and then we could do a t-test.

Angela Kennedy said:
Secondly, Where have the 'depressive disorder' group come from?)

These aren't new patients; they are Oxford criteria patients who have a depressive disorder.

The text says:

Thereafter allocation was
stratifi ed by centre, alternative criteria for chronic fatigue
syndrome12 and myalgic encephalomyelitis,13 and
depressive disorder (major or minor depressive episode or
dysthymia),14

14 First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical
Interview for DSM-IV-TR axis I disorders, research version, patient
edition with psychotic screen (SCID-I/P W/ PSY SCREEN).
New York: Biometrics Research, New York State Psychiatric
Institute, 2002.

so they used the SCID results (rather than, say, the HADS scores) to define depressive disorders.

Angela Kennedy · Mar 10, 2011

Dolphin said:
Figures are in Table 1.

THANK you! I've been looking for those bad boys most of the day on and off (currently having to share a computer- which is not fun!)

Ok - they're confusing... Obviously my premise is that the whole trial could - possibly- include not a single neurological ME sufferer, and an unquantifiable few at most, because of the exclusion criteria, and it appears that the London- already problematic in my opinion because of vague categories, has been made even 'vaguer' by White et al, and superfluous in the face of the stringent exclusion process.

Nevertheless, one interest is in how 'depressive disorder' is a category here, as well as the supposed 'ME' sufferers.

BTW, I think I said to you somewhere that Figure 2 showed there wasn't a different response for subgroups (because of what the authors said). However, somebody elsewhere has happened to point that the "yellow" (what colour is it?) overlap in Figures 2F and 2G is interesting. It'd be good if somebody could get the means and standard deviations out of the authors and then we could do a t-test

thanks for that D. I'll have a look at the yellow.

PACE Trial and PACE Trial Protocol

Guest

Senior Member

Guest

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member