Reported, part-reported and unreported outcome measures from the PACE Trial


Senior Member
Another person and I put this together fairly quickly. Hopefully there are no errors but probably best to check yourself if doing something formal or ask a few people.

Primary outcome measures – Primary efficacy measures:

The 11 item Chalder Fatigue Questionnaire measures the severity of symptomatic fatigue [27], and has been the most frequently used measure of fatigue in most previous trials of these interventions. We will use the 0,0,1,1 item scores to allow a possible score of between 0 and 11. A positive outcome will be a 50% reduction in fatigue score, or a score of 3 or less, this threshold having been previously shown to indicate normal fatigue [27].
Not reported. Refusal to do so.

Mean Chalder Fatigue Questionnaire (Likert scoring) scores reported instead.

The SF-36 physical function sub-scale [29] measures physical function, and has often been used as a primary outcome measure in trials of CBT and GET. We will count a score of 75 (out of a maximum of 100) or more, or a 50% increase from baseline in SF-36 sub-scale score as a positive outcome. A score of 70 is about one standard deviation below the mean score (about 85, depending on the study) for the UK adult population [51,52].
Not reported. Refusal to do so.

Mean SF-36 physical function scores reported instead.

Those participants who improve in both primary outcome measures will be regarded as overall improvers.
Not reported. Refusal to do so.

Total score for pre-specified primary outcomes: 0 out of 3, or 0 out of 7 if counting individual parts.

Secondary outcome measures – Secondary efficacy measures

1. The Chalder Fatigue Questionnaire Likert scoring (0,1,2,3) will be used to compare responses to treatment [27].
Elevated to primary outcome.

2. The self-rated Clinical Global Impression (CGI) change score (range 1 – 7) provides a self-rated global measure of change, and has been used in previous trials [45]. As in previous trials, we will consider scores of 1 or 2 as a positive outcome ("very much better" and "much better") and the rest as non-improvement [23].
Percentages have been reported in score bands of 1+2, 3+4+5, 6+7.

3. The CGI change scale will also be rated by the treating therapist at the end of session number 14, and by the SSMC doctor at the 52-week review.
Neither of these have been reported in any detail, but the SSMC doctor-rated scores were used to substitute missing patient-rated CGI data in the recovery criteria.

4. "Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less [27], (ii) SF-36 physical Function score of 85 or above [47,48], (iii) a CGI score of 1 [45], and (iv) the participant no longer meets Oxford criteria for CFS [2], CDC criteria for CFS [1] or the London criteria for ME [40].
No. All parts of the recovery criteria were changed, some drastically, there's now overlap between baseline entry criteria and recovery criteria.

5. The Hospital Anxiety and Depression Scale scores in both anxiety and depression sub-scales [38].

6. The Work and Social Adjustment scale provides a more comprehensive measure of participation in occupational and domestic activities [33].

7. The EuroQOL (EQ-5D) provides a global measure of the quality of life [39].
Yes but only overall value, not each dimension.

8. The six-minute walking test will give an objective outcome measure of physical capacity [31].
Yes, but after minimal results this measure is now being downplayed as unreliable and not an "objective outcome measure of physical capacity".

9. The self-paced step test of fitness [43].
Only in graph form, and reported 4 years after the first paper was published. Null result challenges the rationale and some reported benefits of CBT and GET (suggests patients not increasing activity as presumed). A request for summary figures was rejected as 'vexatious'.

10. The Borg Scale of perceived physical exertion [44], to measure effort with exercise and completed immediately after the step test.
Sort of: Borg/%max HR reached in Figure 2 of mediation analysis.

11. The Client Service Receipt Inventory (CSRI), adapted for use in CFS/ME [31], will measure hours of employment/study, wages and benefits received, allowing another more objective measure of function.
Details reported, but after null results the investigators denied its usefulness as a "more objective measure of function".

12. An operationalised Likert scale of the nine CDC symptoms of CFS [1].
Was (sort of), always in violation of CDC criteria (symptoms counted over 1 week not 6 months).

Likert scale data (that means not binary) was not given.

Instead all that was given was present/absent data and only for two symptoms: "Poor concentration or memory" and "Postexertional malaise".

Also, a composite yes/no score for the 9 CDC symptoms ["Chronic fatigue syndrome symptom count"] is given.

13. The Physical Symptoms (Physical Health Questionnaire 15 items(PHQ15)) [35].

14. A measurement of participant satisfaction with the trial will also be taken at 52 weeks [53].

Other outcomes

The above does not include other outcomes.

Under assumptions they state:
"We propose that a clinically important difference would be between 2 and 3 times the improvement rate of SSMC."
That would have been related to the now abandoned primary outcomes. It certainly isn't possible now when the improvement rates were usually so high in all groups that it was impossible to get two to three times (would reach 100%).

One of the two adverse outcomes was changed to a drop of 20 points over two followup measurements instead of one. But measurements only were taken at 0, 12, 24, 52 weeks, so two consecutive measurements may be several months and relapses lasting months would not be detected (but may be detected by other safety outcomes?).

Still waiting on a paper on predictors.

In the statistical analysis plan (which came out after the cost effectiveness paper was published), the PACE Trial investigators said they would:

The main analyses will use an informal care unit cost based on the replacement method (where the cost of a homecare worker is used as a proxy for informal care). We will alternatively use a zero cost and a cost based on the national minimum wage for informal care. We will also conduct sensitivity analyses around the costs attached to lost employment.

What they actually did was

Unpaid informal care from family/friends was measured by asking patients how many hours of care were provided because of fatigue. Alternative methods exist for valuing informal care, with the opportunity cost and replacement cost approaches being the most recognised. We adopted the former and valued informal care at £14.60 per hour based on national mean earnings [16]

They also wrongly claimed:

Sixth, there is similar uncertainty around the most appropriate way of valuing informal care [26]. Alternative approaches were used in the sensitivity analyses and these did not make a substantial difference to the results.

PACE Trial protocol said:

Adverse outcomes
Adverse outcomes (score of 5–7 of the self-rated CGI) will be monitored by examining the CGI at all follow-up assessment interviews 49]. An adverse outcome will be considered to have occurred if the physical function score of the SF-36 28] has dropped by 20 points from the previous measurement. This deterioration score has been chosen since it represents approximately one standard deviation from the mean baseline scores (between 18 and 27) from previous trials using this measure23,25].

Both of these were changed (for CGI, it became 6 and 7)

There is no mention both of these were changed:

Serious deterioration in health was defined as any of the following outcomes: a short form-36 physical function score decrease of 20 or more between baseline and any two consecutive assessment interviews;[16] scores of much or very much worse on the participant rated clinical global impression change in overall health scale at two consecutive assessment interviews;[25] withdrawal from treatment after 8 weeks because of a participant feeling worse; or a serious adverse reaction.

One person: The serious adverse events and serious adverse reactions were reported, but the non-serious adverse reactions remain totally unreported (despite a dedicated paper on adverse effects). Non-serious adverse events reported and similar between groups but aggregated together (they were graded by severity but this hasn't been reported so there's a possibility that GET had more severe effects). As other definitions of adverse effects are quite strict, the true story of safety might be hidden in this data. Relapses significant to patients lasting up to a month could occur without being classed as serious.

Post-hoc additions:

Normal range in fatigue and physical function and both for individuals;

Clinically useful difference in fatigue and physical function and both for individuals.

Correctly reported as non-pre-specified?

Normal range in fatigue and physical function: Yes (post-hoc).

Individual level clinically useful difference: Yes (post-hoc).

Recovery criteria: No. Claimed it was "pre-specified" but evidence suggests changes after data unblinding and knowledge of components to recovery criteria.

Details of changes to mediation analysis unclear, seems to be an exploratory analysis.


This is largely based on the protocol. In the statistical analysis plan, it was said other things would be reported but we didn't have time at this stage to cross check it.
Last edited: