PACE Trial and PACE Trial Protocol

Dolphin · Feb 19, 2011

Thanks oceanblue

oceanblue said:
The Oxford criteria require that "c) The fatigue is severe and disabling". However, the PACE trial went futher than this, requiring that the disability (activity) element of a SF-36 PF subscale score of 65 or less. Thus the PACE trial recruitment criteria sets a disability threshold of 65 or less.

Using this threshold, 12% (251 ex 2,080 diagnosed as meeting the Oxford Criteria, from Figure 1) were excluded from the trial due to a PF score of 70 or more (let's assume it was 70 exactly). Yet these excluded patients had already been diagnosed by trial clinicians as suffering from 'severe & disabling fatigue'.

So we know that, according to PACE clinicians, a PF score of 70 counts as disabling fatigue.
Yet, when it comes to assessing recovery, a PF score of 60, ie 10 points lower , counts as 'normal'.

Great spot. :Sign Good Job: (one tiny point: it's slightly awkward as they changed the entry criteria part way through from <=60 to <=65. But I think they would let this through ok and it would take up words to cover both).

oceanblue said:
Black is white, the moon's a balloon and we're a bunch of malingers.

:Retro smile:
:worried:

oceanblue · Feb 19, 2011

Esther12 said:
I'm tempted to agree with Cort's appraisal... it's conclusions are so weak it might even be good for us. But given how the study has been promoted and reported, and how similarly weak studies have been used misleadingly in the past, I'm afraid that's probably a rather panglossian view of things.

I think the actual evidence from PACE, that CBT & GET have a pretty limited impact, can be powerful tools to expose the limitations of the biopsychosocial model. However, it's also clear from the way that the results were presented in the paper, spun by the authors and reported in the media that as things stand the PACE trial is pretty unhelpful to a better understanding of this illness.

The question is, how do we use the evidence in PACE to turn things around? Currently I'm not sure, though letter to the Lancet will be a start. So I still think it's a big opportunity but i'm not at all sure how we make the most of it.

Dolphin · Feb 19, 2011

Need full paper: Chalder fatigue normative data from Norway

Can anyone get the full paper?
It'd be good to see more of the data e.g. SDs

Remember they're claiming a score of 18 or less and one has a normal level of fatigue.

J Psychosom Res. 1998 Jul;45(1 Spec No):53-65.

Fatigue in the general Norwegian population: normative data and associations.
Loge JH, Ekeberg O, Kaasa S.

Department of Behavioural Sciences in Medicine, University of Oslo, Norway. j.h.loge@medisin.uio.no

Abstract
Population norms for interpretation of fatigue measurements have been lacking, and the sociodemographic associations of fatigue are poorly documented. A random sample of 3500 Norwegians, aged 19-80 years, was therefore investigated. A mailed questionnaire included the fatigue questionnaire (11 items) in which the sum score of the responses (each scored 0, 1, 2, 3) is designated as total fatigue (TF). Sixty-seven percent of those receiving the questionnaire responded. Women (TF mean=12.6) were more fatigued than men (TF mean=11.9), and 11.4% reported substantial fatigue lasting 6 months or longer. TF and age were weakly correlated (men: r=0.17; women: r=0.09). No firm associations between fatigue and social variables were found. Disabled and subjects reporting health problems were more fatigued than subjects at work or in good health. Fatigue is highly prevalent in somatic and psychiatric disorders, but is often neglected. This national representative sample provides age- and gender-specific norms that will allow for comparisons and interpretations of fatigue scores in future studies.PMID: 9720855 [PubMed - indexed for MEDLINE]

Hope123 · Feb 19, 2011

Very slowly reading here but quick comments:

1. PM me if you want the full article.

2. 6 minute walk test:

Optimal reference equations from healthy population-based samples using standardized 6MWT methods are not yet available. In one study, the median 6MWD was approximately 580 m for 117 healthy men and 500 m for 173 healthy women (48). A mean 6MWD of 630 m was reported by another study of 51 healthy older adults (53). Differences in the population sampled, type and frequency of encouragement, corridor length, and number of practice tests may account for reported differences in mean 6MWD in healthy persons. Age, height, weight, and sex independently affect the 6MWD in healthy adults; therefore, these factors should be taken into consideration when interpreting the results of single measurements made to determine functional status. We encourage investigators to publish reference equations for healthy persons using the previously mentioned standardized procedures.

Article from 2002 from American Thoracic Society guidelines -- you can spring off of this to find a more recent article.

http://ajrccm.atsjournals.org/cgi/content/full/166/1/111
3. Have heard that CAA will be analyzing PACE and publishing a statement. I've asked them to consider widely circulating it as much as possible.

Cort · Feb 19, 2011

That's good stuff Hope123 - they weren't even close to normative for healthy women. Look at that 630 meters for healthy older adults....whew...

I put my take on the front page http://forums.aboutmecfs.org/content.php?369-A-Hitch-in-its-Step

Dolphin · Feb 19, 2011

A look at what happened to the outcome measures that had been promised

Registering one's trial and outcome measures is very important now - some journals won't accept trials that haven't been registered.
The authors went one step further and published their protocols in a journal.

White PD, Sharpe MC, Chalder T, DeCesare JC, Walwyn R; PACE trial group. Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise, as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol. 2007 Mar 8;7:6.

http://www.biomedcentral.com/1471-2377/7/6

Here's a look at what happened to them
[Aside: The analyses they promised would need to be looked at separately (I'm not promising I'll do it)]

Measures

Primary outcome measures – Primary efficacy measures (you're really not supposed to touch these)

Since we are interested in changes in both symptoms and disability we have chosen to designate both the symptoms of fatigue and physical function as primary outcomes. This is because it is possible that a specific treatment may relieve symptoms without reducing disability, or vice versa. Both these measures will be self-rated.

The 11 item Chalder Fatigue Questionnaire measures the severity of symptomatic fatigue [27], and has been the most frequently used measure of fatigue in most previous trials of these interventions. We will use the 0,0,1,1 item scores to allow a possible score of between 0 and 11. A positive outcome will be a 50% reduction in fatigue score, or a score of 3 or less, this threshold having been previously shown to indicate normal fatigue [27].

Not given per protocol.
Likert scoring used (not necessarily a bad thing but a change)
Normal fatigue no longer given as a primary outcome measure
(when it is given later, not in the abstract, the 50% threshold is not used and a score of 18 is seen as acceptable which, most people would say, does not equate to 3 or less on bimodal scale (out of 11); a score of 18 is a minimum of 4 (out of 11) on the bimodal scoring and a maximum of 9).

The SF-36 physical function sub-scale [29] measures physical function, and has often been used as a primary outcome measure in trials of CBT and GET. We will count a score of 75 (out of a maximum of 100) or more, or a 50% increase from baseline in SF-36 sub-scale score as a positive outcome. A score of 70 is about one standard deviation below the mean score (about 85, depending on the study) for the UK adult population [51,52].

Not given per protocol.
Normal physical functioning no longer given as a primary outcome measure
(when it is given later, not in the abstract, the 50% increase is not used and a score of 60 or more is suddenly seen as acceptable despite the fact that a score of 65 or less lets one into the trial).

Those participants who improve in both primary outcome measures will be regarded as overall improvers.

Not given per protocol.
Data is given, but not as a primary outcome measure, but it uses these new, lower thresholds, mentioned above

Secondary outcome measures – Secondary efficacy measures

1. The Chalder Fatigue Questionnaire Likert scoring (0,1,2,3) will be used to compare responses to treatment [27].

Has become one of the primary outcome measures.

2. The self-rated Clinical Global Impression (CGI) change score (range 1 – 7) provides a self-rated global measure of change, and has been used in previous trials [45]. As in previous trials, we will consider scores of 1 or 2 as a positive outcome ("very much better" and "much better") and the rest as non-improvement [23].

Per protocol.

3. The CGI change scale will also be rated by the treating therapist at the end of session number 14, and by the SSMC doctor at the 52-week review.

Neither of these are given.

4. "Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less [27], (ii) SF-36 physical Function score of 85 or above [47,48], (iii) a CGI score of 1 [45], and (iv) the participant no longer meets Oxford criteria for CFS [2], CDC criteria for CFS [1] or the London criteria for ME [40].

Not given.
The commentary by Knoop & Bleijenberg talks about the percentage in "recovery" but doesn't use anything close to this definition (just to clarify, they don't give extra data).

5. The Hospital Anxiety and Depression Scale scores in both anxiety and depression sub-scales [38].

Given (both)

6. The Work and Social Adjustment scale provides a more comprehensive measure of participation in occupational and domestic activities [33].

Given (both)

7. The EuroQOL (EQ-5D) provides a global measure of the quality of life [39].

Not given

8. The six-minute walking test will give an objective outcome measure of physical capacity [31].

Given

9. The self-paced step test of fitness [43].

Not given

10. The Borg Scale of perceived physical exertion [44], to measure effort with exercise and completed immediately after the step test.

Not given

11. The Client Service Receipt Inventory (CSRI), adapted for use in CFS/ME [31], will measure hours of employment/study, wages and benefits received, allowing another more objective measure of function.

Not given

12. An operationalised Likert scale of the nine CDC symptoms of CFS [1].

Not given per protocol.
We are just give a composite yes/no score for the 8 CDC symptoms combined and a percentage present/absent score for two of the symptoms.
Likert scoring is the opposite of yes/no (or present/abscent) scoring
What they actually asked that patients at different stages about each of the symptoms was the following:

not present at all
present a little
presence more often than not
present most of the time
present all of the time

but we don't get this data.

13. The Physical Symptoms (Physical Health Questionnaire 15 items(PHQ15)) [35].

Not given

14. A measurement of participant satisfaction with the trial will also be taken at 52 weeks [53].

Given

Adverse outcomes

Adverse outcomes (score of 5–7 of the self-rated CGI) will be monitored by examining the CGI at all follow-up assessment interviews [49]. An adverse outcome will be considered to have occurred if the physical function score of the SF-36 [28] has dropped by 20 points from the previous measurement. This deterioration score has been chosen since it represents approximately one standard deviation from the mean baseline scores (between 18 and 27) from previous trials using this measure [23,25]. Furthermore, the RN will enquire regarding specific adverse events at all follow-up assessment interviews.

Not given per protocol
A lot of people would miss this but they redefined it to be “any two consecutive assessment interviews” [Quote: “Serious deterioration in health was defined as any of the following outcomes: a short form-36 physical function score decrease of 20 or more between baseline and any two consecutive assessment interviews”] Comment: it is already hard for a patient to drop 20 points on the SF-36.

Dolphin · Feb 19, 2011

Web Appendix Table C: Description of Serious Adverse Reactions

Web Appendix Table C: Description of Serious Adverse Reactions

Description Relationship to treatment SAR category*

Adaptive pacing therapy (2)
1. Suicidal thoughts possibly related e
2. Worsened depression possibly related e

Cognitive behaviour therapy (4)
1. Episode of self harm possibly related f
2. Low mood and episode of self harm possibly related e & f
3. Worsened mood and CFS symptoms possibly related d
4. Threatened self harm possibly related e

Graded exercise therapy (2)
1. Deterioration in mobility and self-care possibly related d
2. Worse CFS symptoms and function possibly related d

Specialist medical care alone (2)
1. Worse CFS symptoms and function probably related d
2. Increased depression and incapacity possibly related d

*SAR categories: a) Death; b) Life-threatening event; c) Hospitalisation (hospitalisation for elective treatment of
a pre-existing condition is not included), d) Increased severe and persistent disability, defined as a significant
deterioration in the participants ability to carry out their important activities of daily living of at least four weeks
continuous duration; e) Any other important medical condition which may require medical or surgical intervention
to prevent one of the other categories listed; f) Any episode of deliberate self-harm.

To me, although it's a small number, the two experienced by people doing GET could be important. Maybe hard to bring up in a letter of 250 words. But something to be used perhaps when safety is being claimed.

oceanblue · Feb 20, 2011

More great digging, dolphin, thank you. The PACE group have a lot of questions to answer. Frankly, so does the MRC Trial Steering Group which is supposed to independently oversee the trial and keep the authors honest. It seems to have fallen asleep on the job.

oceanblue · Feb 20, 2011

Dolphin said:
Here's some UK normative data:

Short form 36 (SF36) health survey questionnaire: normative data for adults of working age. BMJ. 1993 May 29;306(6890):1437-40. Jenkinson C, Coulter A, Wright L. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1677870/pdf/bmj00022-0017.pdf

Dolphin, I think I love you!

Sorry, getting carried away here because there are some gems in that paper that expose the flaws in the PACE choice of thresholds.

This study is based on completed questionnaires form 9,332 people of working age in central England.

The survey gives SF-36 PF scores for the whole sample, for people who reported a longstanding illness and, crucially, people who did not report a longstanding illness. This last group might be the best estimation of 'healthy'. Here are the mean SF-36 scores with SDs in brackets and the threshold that would result from using the PACE formula of "mean minus 1 SD":

'Healthy': 92.5 (13.4) = 79.1
'Chronically ill': 78.3 (23.2) = 55.1
'Population*': 89 (16) = 73
*oh damn, they don't seem to give this separately, this is my guesstimation from looking at the data they do give

Since the PF scale only scores in 5 point intervals (e.g. 60,65,70) these translate as PF threshold scores as:
Healthy = 80, population = 70 or 75. PACE used 60.

slightly more complex point
As a bonus, they provided SF-36 PF scores for people who had consulted a doctor in the 2 weeks prior to completing the questionnaire. This is a pretty close approximation of the 'GP attenders' used to establish norm data for the fatigue scale. The scores are
81.6 (23) = 58.6

This shows that not only are the GP attenders substantially less well than 'healthy' people (81.6 vs 89-ish), they also have a much bigger SD, which has the effect of lowering the threshold even further (60 vs 70 or 75). Obviously this is for PF scores not fatigue scores, but it does illustrate how GP attenders differ from the normal population

Esther12 · Feb 20, 2011

Thanks for that OB... this does seem to be the key point so far.

Dolphin · Feb 20, 2011

Dolphin said:
Hi Cort and all,

As the file is too large to upload into the library, I have uploaded a file to rapidshare.com which people can hopefully download for free http://rapidshare.com/files/448676468/11_PACE_Trial_Protocol.pdf . It does not involve torrents.

On page 162, one can see the Chalder fatigue scale.
It also includes the other questionnaires used.

Bye,
-------
If somebody could confirm they could download it, I'd appreciate it. Thanks.

Some people had problems with the RapidShare link.
I've placed it in another place now:
https://www.yousendit.com/download/T2pGd0VBaFI4NVh2Wmc9PQ
or http://bit.ly/htzc1Y

I can also E-mail to you if your account can cope with an 11MB file (gmail accounts can).

My granny (91) is here at the moment so won't be able to contribute for a little while.

oceanblue · Feb 20, 2011

Dolphin said:
Great spot. (one tiny point: it's slightly awkward as they changed the entry criteria part way through from <=60 to <=65. But I think they would let this through ok and it would take up words to cover both).

Thanks for pointing that out. I'll keep it simple and just compare the PF score of 65=disabling fatigue with 60 = 'normal'.

oceanblue · Feb 20, 2011

Re: reference used to justify major changes in reporting of primary outcomes

Dolphin said:
It's free here in case anyone wants it:
http://www.mayoclinicproceedings.com/content/77/4/371.long

That's a 2002 paper. They published their protocol in 2007.

Sorry, my mistake, it was ref 30 not 31, and this was publisehd in 2009, after their protocol
Measurement in clinical trials: a neglected issue for statisticians?

Abstract

Biostatisticians have frequently uncritically accepted the measurements provided by their medical colleagues engaged in clinical research. Such measures often involve considerable loss of information. Particularly, unfortunate is the widespread use of the so-called 'responder analysis', which may involve not only a loss of information through dichotomization, but also extravagant and unjustified causal inference regarding individual treatment effects at the patient level, and, increasingly, the use of the so-called number needed to treat scale of measurement. Other problems involve inefficient use of baseline measurements, the use of covariates measured after the start of treatment, the interpretation of titrations and composite response measures. Many of these bad practices are becoming enshrined in the regulatory guidance to the pharmaceutical industry. We consider the losses involved in inappropriate measures and suggest that statisticians should pay more attention to this aspect of their work.

I'm afraid that tackling this paper is way above my pay grade.

oceanblue · Feb 20, 2011

Esther12 said:
Thanks for that OB... this does seem to be the key point so far.

Thanks, Esther12.
OB (Obi?!)

Cort · Feb 20, 2011

Dolphin said:
Registering one's trial and outcome measures is very important now - some journals won't accept trials that haven't been registered.
The authors went one step further and published their protocols in a journal.

Dolphin are you saying that all the "not-givens' per protocol are things they said they would and did not make it into the study?

If so do we have any idea if they actually measured any of those; ie did they measure them and decide not to report on them or did they just not do them?

Can we show there's a pattern to the non-used tests; ie that they tend to better indicators of functionality?

Snow Leopard · Feb 20, 2011

They definitely measured them. I bet they will claim that they will publish the metrics like the EQ-5D and the hours worked in a later paper where they can spin the unimpressive data by saying, 'but look, the treatment will still save a few dollars'. Compared to the billions in economic costs each year....

Dolphin · Feb 20, 2011

Cort said:
Dolphin are you saying that all the "not-givens' per protocol are things they said they would and did not make it into the study?

If so do we have any idea if they actually measured any of those; ie did they measure them and decide not to report on them or did they just not do them?

Like Snow Leopard said, I think there is a 99.99% chance they were measured. The protocol paper was published in 2007 after the trial had long started. The questionnaires can be seen at: https://www.yousendit.com/download/T2pGd0VBaFI4NVh2Wmc9PQ . So "not given" means they measured them but didn't publish them.

Cort said:
Can we show there's a pattern to the non-used tests; ie that they tend to better indicators of functionality?

I'm not sure. The whole point about registering primary and secondary outcome measures is that one can't then not publish them and come up with new ones that suit so it's a bad thing by itself. Such "rules" are mainly designed for pharmaceutical companies but really apply for anyone. They may be part of the CONSORT guidelines.

Dolphin · Feb 20, 2011

Snow Leopard said:
They definitely measured them. I bet they will claim that they will publish the metrics like the EQ-5D and the hours worked in a later paper where they can spin the unimpressive data by saying, 'but look, the treatment will still save a few dollars'. Compared to the billions in economic costs each year....

Yes. Or alternatively they may be buried. In the FINE Trial, the authors were challenged that they didn't publish the step test (the only objective measure they used) - they hadn't mentioned in the initial paper they hadn't used it.
Their reply was a weak one:

We did not report the step test as an outcome because of a large amount of missing data.

This sort of behaviour is watched like a hawk when it is done in trials of pharmaceuticals. I'm no expert on who does it - the FDA would be one (and similar agencies in other places). We need to do it for non-pharmacological trials although it probably will get mentioned in some reviews like Cochrane.

Bob · Feb 20, 2011

I might be wrong about this... So please let me know if I am...
But I think I've just come across an almost glaringly obvious bit of spin in the reported results...

All the patients in the trial received specialist medical care (SMC), and then the CBT and GET was given along side this.
I think that the improvement scale measurements (or at least some of them - I haven't checked them all) have been reported as compared to the SMC alone group. So any measurements are reported over and above the improvements that the SMC alone group reported.

But I think that the percentage of patients who reported improvement has been reported as an absolute figure, and not as a comparison with the SMC along figures.

In table 5, 25% of patients reported an improvement in their symptoms when using SMC alone, and 41% of patients reported an improvement with GET (which i think means GET+SMC).
In which case, only 16% of patients reported an improvement using GET over and above the findings of the SMC alone group.
So GET and CBT actually only helped 16% of the patients, as compared to the SMC alone group, and so 84% of patients were not helped by CBT or GET.

Am I right in thinking that the SMC-alone group was used as the control group in this study? If so then why haven't the figures been reported as such? i.e. why haven't the GET and CBT figures been reported as a comparison with the control group?

Basically, only 16% of patients reported an improvement using GET or CBT, over and above the control group who were receiving ordinary medical care.

Am I right in this, or have i missed something here?

16% is an appallingly low figure for a 5m government funded study. What a bloody waste of money.

If this is the case, then I think we should all be focusing on the 16% figure rather than the 41% figure, and bring this to the attention of the media and the ME organisations.

Like I said in an earlier post, it would be hard for the NHS to justify giving group courses of GET when 60% of the patients would not be helped.
But if only 16% of patients are helped by CBT or GET, and 84% are not helped, then this makes is almost totally unjustifiable.

oceanblue · Feb 20, 2011

Bob said:
In table 5, 25% of patients reported an improvement in their symptoms when using SMC alone, and 41% of patients reported an improvement with GET (which i think means GET+SMC).
In which case, only 16% of patients reported an improvement using GET over and above the findings of the SMC alone group.
So GET and CBT actually only helped 16% of the patients, as compared to the SMC alone group, and so 84% of patients were not helped by CBT or GET.

The SMC alone group, was the control group in this study, right? So why haven't the figures been reported as such? i.e. why haven't the GET and CBT figures been reported as a comparison with the control group?

You're right that CBT/GET basically helped a net 16% extra patients over and above SMC, which is not very impressive. However, I'm not sure the researchers were bound to report this figure as a difference, as opposed to reporting figures for both CBT and SMC. I don't know enough about stats to know what is strictly 'correct'. And it looks like you're quoting the CGI figures for patients who say they are much/very much improved.

But basically when Peter White said in the media that CBT helped 6/10 patients (using their slightly odd definition of 'improved', rather than CGI scores) he was missing out the crucial info that in this case 4.5/10 patients were helped by nothing at all (SMC).

PACE Trial and PACE Trial Protocol

Senior Member

Guest

Senior Member

Senior Member

Phoenix Rising Founder

Senior Member

Senior Member

Guest

Guest

Senior Member

Senior Member

Guest

Guest

Guest

Phoenix Rising Founder

Hibernating

Senior Member

Senior Member

Senior Member

Guest