PACE Trial and PACE Trial Protocol

oceanblue · May 9, 2011

Response bias? The graph

I plotted SF36 scores against 6MWT scores to see if that would throw some light on response bias. If response bias happens, you would expect to see this at the end of the trial (52 weeks), when factors like wanting to support the therapist/doctor come into play but not at baseline, when particpants don't even know which therapy they will get.

The graph shows a pretty consistent relationship between SF36 scores and 6MWT scores at baseline, shown by the blue trend line (and red squares). At 52 weeks there isn't such a clear pattern (green triangles), but the SF36 scores are higher than you would predict given the relationship between SF36 and 6MWT at baseline. It's as if the SF36 scores have been boosted a little - which could be the response bias.

- WSAS is data for the 4 quartiles given in the Cella paper, using PACE baseline data for the Work and Social Adjustment Scale. It's the same basic data as PACE baseline, but chopped up a different way. Nb all these data points are group means (each group is around 150 people), not data for individual participants.

This is all speculation though and comments welcome.
ps happy to supply excel file with data and graph if anyone wants it.

Esther12 · May 9, 2011

Thanks for that graph OB. It really makes the point well.

anciendaze · May 9, 2011

I think oceanblue makes a good case for response bias, as a subjective effect. This doesn't seem as significant to me as the fact we have seen that at least one therapist made a determined attempt to keep a patient in the trial, to the point of violating ethics and the protocol, while those running the trial tolerated some 30% of those "completing" the trial without providing any objective data on improvement. How can you count a patient as completing without getting data from them? To my mind these were dropouts hidden under another name. Treat them as such, and any validity for improvements in objective measures vanishes.

Apparently, proposals for objective measures were only used to get funding, and were later dropped as inconvenient.

As always, I feel the distribution of activity scores in the general population is only approximately Gaussian in the sense that 85=95 (mean = mode). If I treat that distribution as a spike and a slab, I see a normal (Gaussian) spike of relatively healthy people with a mean of 95 and an SD of about 10, plus a slab of aging or ill people uniformly distributed across the range. The reason this tapers off at the low end is that the data are based on people visiting doctors, and those below some limit were not able to do so. These two parameters were the basis for the published criteria for recovery, as well as all measures of statistical significance. This emperor has no clothes whatsoever.

Dolphin · May 9, 2011

anciendaze said:
The reason this tapers off at the low end is that the data are based on people visiting doctors, and those below some limit were not able to do so.

I think you're think of the Chalder Fatigue Questionnaire data rather that SF-36 normals which were based on postal questionnaires, and the like.

The problem with the Chalder Fatigue Questionnaire population data was more that the people who didn't attend a GP in the previous 12 months were included, and that would likely be a particularly healthy group covering 20-30 (?) % of the population.

anciendaze · May 9, 2011

Dolphin said:
I think you're think of the Chalder Fatigue Questionnaire data rather that SF-36 normals which were based on postal questionnaires, and the like..

At this point it is likely my poor damaged brain is fried. The reasoning in this study resembles that used in old Star Trek episodes to dispatch unwanted robots. Personally, I'm very sure I was not talking about fatigue scores. The references they cited don't support the values used for mean and SD, unless mean and mode are quite different. However, I'll drop it as not worth the effort to understand.

What do you think about the idea of patients who did not provide objective data for judging improvement being counted as completing the trial? Imagine running a race in which competitors could have either starting or finishing times without necessarily having both. (Oh, did I mention that they started separately?) You could then interview them to decide who won, based on their own opinions of their performance. I could imagine this crew nodding heads, and saying, "that sounds eminently reasonable."

Dolphin · May 9, 2011

anciendaze said:
Dolphin said:

anciendaze said:

The reason this tapers off at the low end is that the data are based on people visiting doctors, and those below some limit were not able to do so.

Click to expand...

I think you're think of the Chalder Fatigue Questionnaire data rather that SF-36 normals which were based on postal questionnaires, and the like.

Click to expand...

At this point it is likely my poor damaged brain is fried. The reasoning in this study resembles that used in old Star Trek episodes to dispatch unwanted robots. Personally, I'm very sure I was not talking about fatigue scores. The references they cited don't support the values used for mean and SD, unless mean and mode are quite different. However, I'll drop it as not worth the effort to understand.

I know you are generally talking about SF-36 PF scores. But I specifically replied to the bit where you say there is a problem with the population data (because I presume that is what you are talking about) as it is obtained from those visiting doctors.

I remain convinced my point is correct. The reason I make such points is that I think it is important that we are exact in what we are saying. Many of us are writing to medical journals.

However, you say you are very sure you were not talking about fatigue scores; if so, I will be interested in seeing the evidence.

anciendaze said:
What do you think about the idea of patients who did not provide objective data for judging improvement being counted as completing the trial? Imagine running a race in which competitors could have either starting or finishing times without necessarily having both. (Oh, did I mention that they started separately?) You could then interview them to decide who won, based on their own opinions of their performance. I could imagine this crew nodding heads, and saying, "that sounds eminently reasonable."

I have been a big proponent that this trial needed objective outcome measures. Indeed I remember being told on a list in 2003 the trial would be using actometers; then at some later stage, discovering they were dropped.

Also, even if they had used actometers, they probably would have used them as secondary measures while I would have wanted them as a primary outcome measure. Also as part of the definition of recovery.

It is interesting that they don't have 6MWDs for some groups; I have seen analyses called sensitivity analyses performed before where they look at various options e.g. "worst case scenario for missing values", "best case scenarios", etc. A slight problem we have here is that a lot of the people in the other groups didn't do it either. A lot of people in the world have somewhat disorganised lives or for whatever reason don't do all things they should e.g. if taking part in a research trial, do all the tests. It probably can't all be put down to them being made worse.

oceanblue · May 10, 2011

Chalder Fatigue Scale normative data is suspect

PACE set a threshold of 'normal' fatigue as a CFQ score of 18 or less, based on a mean of 14.2 and SD of 4.6, which were taken from a 2010 study, Measuring fatigue in clinical and community settings. But these figures may well not be representative of the working age population.

The way they selected the participants is complex, and the underlying data was collected published in 1994: Population based study of fatigue and psychological distress (T Pawlikowska, T Chalder, S R Hirsch, P Wallace, D J M Wright, S C Wessely), and this is where the holes start appearing.

Crucial bit is point 4 if you're pressed for time.

1. This does not appear to be a representative sample
They mailed registered patients at several different types of practice, but made no attempt to match them to the population (vs Bowling who showed SF36 data was based on a cohort well-matched with census data).

We sent questionnaires to all patients aged 18 to 45 years registered with selected practices, 3 from London and three from rural or semirural settings.

Two practices (subsequently referred to as practice 1) were working from the same health centre in south London in a mixed urban community with a large amount of temporary accommodation. Practice 2 was an inner city London practice where most patients were socially deprived. The third practice was located on the Surrey-Hampshire border with patients predominantly in socioeconomic classes II and III. Practice 4 was in an urban area of a south coast port, and the last practice was in a Somerset village, with a static close knit community and many stable families. The total number of patients aged 18-45 registered with the practices was 31 651 (15 222 men, 16 429 women).

- They restriced it to patients age 18-45, though didn't explain why (though this particular feature is likely to bias towards healthier individuals).

2. Low response rates could lead to biased findings
The response rate was only 48%. After investigating non-responders they found that many had moved (a known problem with GP practices, esp in urban areas) and estimated the response rate from those who received the questionnaire was 64%. The issue is, were people who were less well, or fatigued, more likely to respond to questionnaires about fatigue and health?

By comparison, the Bowling SF36 data used face to face interviews, with a 78% response rate and the SF36 qs were part of a much broader questionnaire including lifestyle and finance - so healthy people are less likely to ignore because it doesn't apply to them. The Jenkinson SF36 figures had a 72% response rate, but this is of those mailed. Let's say 5% of the orginal list they mailed was incorrect/moved (quite a cautious assumption from my direct marketing experience) giving a net response rate of 76% - and again this was part of a larger survey including lifestyle, reducing the chance of healthy people opting out.

ETA: however, this study suggests that ill people might not be more likely to respond, though it does relate to questions on "subjective well-being (overall life satisfaction and self-assessed health)" rather than just health.

3. Only participants who visited their GP were included
To complicate things, Cella didn't use all the data from the original mailing. Instead, data was only used from respondents who subsequently visited their GP about a viral or other complaint and were selected as part of another study. So anyone who was very healthy and never visited their GP would not be included. Those who visited their GP more often would consequently have more chances to make this cohort than those who rarely visited their GP. All of this is likely to bias the sample against healthy individuals.

Precise figures are not given for the original 1994 study but from the figures they give it looks like the mean is very close to 13.6, compared to the 14.2 quoted by Cella for his sub-group, suggesting at least some bias here.

ETA I've found the fatigue case data for the Cella study (Postinfectious fatigue: prospective cohort study in primary care, p1335 under "stage 2 sample"): it gives 42.6% caseness, vs 38% for Pawlikowska, confirming the Cella cohort is more fatigued than the Pawlikowska one.

4.Data from the original study indicate this is an unhealthy cohort.
According to Pawlikowska, 38% of patients had a score about the original Chalder bimodal cut off of 3 (as used in the PACE protocol) and 18.3% of patients were substantially fatigues for 6 months or longer. Whoa, that looks unhealthy, esp as the paper quotes a 1990 paper that found only 10% of GP practice patients had fatigue for one month or more. I think there are some US studies indicating fatigue of over 6 months in the population is much less than 18%.

So, I'm pretty fatigued now, and so are you if you've read this far. But it looks like PACE have been using highly unsuitable 'normative' data. Again.

ETA: should mention that the Cella CFQ data is not normally distributed and therefore, like the SF36 data, is not suitable for use with parametic stats, such as the 'within 1 SD of the mean' formula used by PACE:

Similarly to the CFS group, the community group scores were not distributed normally but were positively skewed. Values of skewness for the nonclinical group ranged between 0.40 (Item 5) and 1.06 (Item 9) with a mean skewness of 0.77.

Dolphin · May 10, 2011

oceanblue said:
I plotted SF36 scores against 6MWT scores to see if that would throw some light on response bias. If response bias happens, you would expect to see this at the end of the trial (52 weeks), when factors like wanting to support the therapist/doctor come into play but not at baseline, when particpants don't even know which therapy they will get.

The graph shows a pretty consistent relationship between SF36 scores and 6MWT scores at baseline, shown by the blue trend line (and red squares). At 52 weeks there isn't such a clear pattern (green triangles), but the SF36 scores are higher than you would predict given the relationship between SF36 and 6MWT at baseline. It's as if the SF36 scores have been boosted a little - which could be the response bias.

- WSAS is data for the 4 quartiles given in the Cella paper, using PACE baseline data for the Work and Social Adjustment Scale. It's the same basic data as PACE baseline, but chopped up a different way. Nb all these data points are group means (each group is around 150 people), not data for individual participants.

View attachment 5485

This is all speculation though and comments welcome.
ps happy to supply excel file with data and graph if anyone wants it.

Thanks for doing the graph, oceanblue, it's great.

It might be good if another graph was done where the line was extended. And if "error bars" for the response bias could be put in, it would be great but not so important.
The equation of the line, y, (y=SF-36 PF) = (0.24616*6MWD) - 41.15385

I think the response bias in the SF-36 PF function for GET may be similar to the APT and SMC response biases (but smaller than the CBT one) because the response bias may incorporated in the 6MWD; the GET participants may be more motivated to show how fit they are now and willing to push themselves further. They also would go for more frequent continuous walks of 6 minutes duration so would probably be better at not going too fast or too slow to get the optimum distance out of the 6 minutes.

Move the 6MWD to the left a bit and one can get a similar distance above the line. Of course, there is no reason the response bias for CBT and GET should be exactly the same.

anciendaze · May 10, 2011

Dolphin said:
I know you are generally talking about SF-36 PF scores. But I specifically replied to the bit where you say there is a problem with the population data (because I presume that is what you are talking about) as it is obtained from those visiting doctors.

I will concede the point about the source of the data for SF-36.

However, there is a major problem with the health of the comparable population, and I am puzzled by a statement you made earlier. According to those running the study, CFS sufferers have no organic disease, we are merely suffering from "false illness beliefs" and deconditioning. In such a case, it is wrong to compare their recovery process to those with serious conditions like heart failure or COPD. Likewise, there is no reason for us to accept performance limitations of people twice our age. You can't have it both ways, either we have a real, serious, organic illness, or we do not.

It is interesting that they don't have 6MWDs for some groups; I have seen analyses called sensitivity analyses performed before where they look at various options e.g. "worst case scenario for missing values", "best case scenarios", etc. A slight problem we have here is that a lot of the people in the other groups didn't do it either. A lot of people in the world have somewhat disorganised lives or for whatever reason don't do all things they should e.g. if taking part in a research trial, do all the tests. It probably can't all be put down to them being made worse.

My point here is that researchers counted them as completing the study, without objective data. This makes the relative importance of objective measures in the thinking of those running the test obvious.

What can be seen, in studies that have included Actometer data, is that patients displace activity. If they are exhausted after GET, they are less likely to do a 6MWD. The reasoning here is that GET got higher marks, not by causing a higher percentage to decline the test, but by making those who declined the test more likely to perform poorly if they had taken it. We don't know, and nobody really wanted us to know.

There was a selection effect by close to half of those approached to decline to participate, or to officially drop out. We then have another 30% of those remaining not providing useful objective data. The end result is that the objective part of claims can only apply to about 35% of those they intended to treat. Throw in major diagnostic confusion, and there is no clear idea of who, other than therapists, actually benefits.

The end result is a kind of "you can't prove nuthin', copper" defense, which tells you the objective scientific data isn't worth much at all. This is especially telling when only group measures are reported. If the goal was to muddy the waters, this study would serve admirably.

I sense you have an opinion this was not all that unusual for studies of psychiatric treatment. You may well be right. This is not a vote of confidence in the field.

I know of one instance in which a repairman was checking out equipment used for ECT at a psychiatric hospital when one of the staff complimented him on the new equipment not causing the violent contractions seen with older models. He immediately checked, and found a safety plug had never been removed when it was unpacked. The new equipment had been in use for a year.

oceanblue · May 11, 2011

Dolphin said:
I think the response bias in the SF-36 PF function for GET may be similar to the APT and SMC response biases (but smaller than the CBT one) because the response bias may incorporated in the 6MWD; the GET participants may be more motivated to show how fit they are now and willing to push themselves further. They also would go for more frequent continuous walks of 6 minutes duration so would probably be better at not going too fast or too slow to get the optimum distance out of the 6 minutes.

Move the 6MWD to the left a bit and one can get a similar distance above the line. Of course, there is no reason the response bias for CBT and GET should be exactly the same.

Your suggestion for GET being affected by response bias on the 6MWD is certainly possible, but I'm not sure if it's any more plausible than GET having a small but real effect on walking capability. And I would have predicted that SMC would have a smaller response bias than APT, but that doesn't appear to be the case. There were only 5 sessions of SMC with relatively low expectations and satisfaction, compared with 13 sessions of APT (plus 3 of SMC), with a strong therepuetic alliance and high satisfaction. APT is set up for response bias.

It appears from the data that something is going on, but i'm struggling to see a clear, compelling explanation for what it is. Help welcome.

I might have a crack at a deluxe graph in due course.

Dolphin · May 11, 2011

oceanblue said:
Your suggestion for GET being affected by response bias on the 6MWD is certainly possible, but I'm not sure if it's any more plausible than GET having a small but real effect on walking capability. And I would have predicted that SMC would have a smaller response bias than APT, but that doesn't appear to be the case. There were only 5 sessions of SMC with relatively low expectations and satisfaction, compared with 13 sessions of APT (plus 3 of SMC), with a strong therepuetic alliance and high satisfaction. APT is set up for response bias.

There is no reason it can't have been both an increase in GET plus a bit of response bias/training effect.
For example, if the GET group had scored 352.88, they would have increased on the 6MWT by 40.879m (which is more than the other groups) but their SF36 PF value of 57.7 would have exactly the same inflation as CBT (12.215) over the predicted value.

The SF-36 PF questionnaire asks about people's limitations. The APT group may recognise their limitations rather than have an inflated view of what they can do so that might be why it is not bigger than the SMC group.

oceanblue · May 11, 2011

Dolphin said:
There is no reason it can't have been both an increase in GET plus a bit of response bias/training effect.
For example, if the GET group had scored 352.88, they would have increased on the 6MWT by 40.879m (which is more than the other groups) but their SF36 PF value of 57.7 would have exactly the same inflation as CBT (12.215) over the predicted value.

The SF-36 PF questionnaire asks about people's limitations. The APT group may recognise their limitations rather than have an inflated view of what they can do so that might be why it is not bigger than the SMC group.

It's possible, but I'm not convinced that the SMC group should have an inflated view of what they can do either - and the APT group still have all those extra sessions and strong relationship with the therapist. The problem with all of this is that are so many 'ifs' and 'buts' that while a plausible case can be made, so can a plausible counter case. I'd love to use this info in a wider context, eg a letter to a journal, but I don't think the evidence we have is clear enough as things stand.

Dolphin · May 11, 2011

It would be interesting to get/watch out for baseline data from other studies and see does the same equation hold. Although I don't think the 6 minute walking test has been used in many ME/CFS trials. Perhaps it was in the Jason et al. (2007) study.

oceanblue · May 11, 2011

Wessely talk featuring PACE trial

"Health in mind and body: bridging the gap"
http://www.foundation.org.uk/events/audios/audiopdf.htm?e=440&s=1200

20 minute audio and slides, PACE bit starts about 10 mins in.

Some bits I noted:
- He said CFS wasn't down to personality
- but perpetuation was down to behavioural and psychological factors

- described PACE as the 'final definitive trial' and as 'one of the most beautiful behavioural medicine trials we have ever seen'. He showed the timeline fatigue graph (fig 2A in the paper) and called it a 'good result: we have improved - haven't cured - the physical health, the psychological health, the functioning etc of a large number of people'. I'm not sure that quite squares with what Michael Sharpe said in describing the results: that they needed to treat 7 people with CBT or GET for one to improve by a moderate amount.

Enjoy.

Esther12 · May 11, 2011

Thanks OB. Sometimes I get the impression that Wessely realises he's close to being a quack... not that time. If I was a naive medical student, I'd have been convinced by him.

Sean · May 11, 2011

- described PACE as the 'final definitive trial' and as 'one of the most beautiful behavioural medicine trials we have ever seen'.

Beauty is in the eye of the beholder.

Angela Kennedy · May 12, 2011

Sean said:
Beauty is in the eye of the beholder.

Yes, it is disturbing to see Wessely resorting to the word 'beautiful' as a rhetorical device to defend PACE!

Ps, actually - what am I saying? It's ultimately disturbing to see this described by him as the 'final definitive trial'. THAT is wholly irresponsible.

Bob · May 12, 2011

I was actually surprised to see that Wessely is promoting this trial as if he is proud of it.
It shouldn't have surprised me, of course.
But I thought that the authors were actually quite taken aback at how horrendously bad the results were, such that they had to fiddle with the protocol and only selectively report the results.
I suppose that any 5m government-funded study into psychological 'treatments' is beautiful in his eyes, especially when he can claim that it was a successful study.
Now I guess it's the job of our community to make sure that everyone else knows that this Trial was not successful, and the results not relevent for the ME community. Not an easy task, but a crucial one, in my opinion.

Angela Kennedy · May 12, 2011

Bob said:
I was actually surprised to see that Wessely is promoting this trial as if he is proud of it.
It shouldn't have surprised me, of course.
But I thought that the authors were actually quite taken aback at how horrendously bad the results were, such that they had to fiddle with the protocol and only selectively report the results.
I suppose that any 5m government-funded study into psychological 'treatments' is beautiful in his eyes, especially when he can claim that it was a successful study.
Now I guess it's the job of our community to make sure that everyone else knows that this Trial was not successful, and the results not relevent for the ME community. Not an easy task, but a crucial one, in my opinion.

I think you're right Bob.

anciendaze · May 12, 2011

Angela Kennedy said:
Yes, it is disturbing to see Wessely resorting to the word 'beautiful' as a rhetorical device to defend PACE!

Ps, actually - what am I saying? It's ultimately disturbing to see this described by him as the 'final definitive trial'. THAT is wholly irresponsible.

I think that tells you not to hold your breath for any objective results in follow up. He clearly feels that showing you can influence opinions by talking to people for a year is the ne plus ultra of scientific research.

PACE Trial and PACE Trial Protocol

Guest

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Guest

Senior Member

Senior Member

Guest

Senior Member

Guest

Senior Member

Guest

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member