• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial statistical analysis plan

Dolphin

Senior Member
Messages
17,567
Analysis strategy group

The Statistical Analysis Strategy was developed by the PACE Analysis Strategy Group
whose members were:
– Michael Sharpe (Chair, Principal Investigator)
– Rebecca Walwyn, Laura Potts, Tony Johnson and Kim Goldsmith (Statisticians)
– Paul McCrone (Health Economist)
– Peter White and Trudie Chalder (Principal Investigators)
– Julia DeCesare and Hannah Baber (Trial Managers)
So the PIs and a group of people they could order around (I imagine). Maybe this is normal enough.
 

Dolphin

Senior Member
Messages
17,567
They don't mention changes e.g. 1 a & b are altered (see protocol below)
(I could miss bits)
Safety outcomes are:
1. Serious deterioration (primary) defined as one or more of the following up to 52 weeks:

a. SF-36 physical function score diminishing by 20 or more points between baseline and any two consecutive assessment interviews.
[Changed. Was: "dropped by 20 points from the previous measurement"]

b. Participant-rated CGI change score of “much worse” or “very much worse” at two consecutive assessment interviews.
[Changed. Was: "a little worse" (CGI of 5), “much worse” or “very much worse”]

c. Withdrawal from therapy (APT, CBT, or GET) later than 8 weeks due to participant’s reported worsening of their condition.

d. A serious adverse reaction.

2. Serious adverse events (includes serious adverse reactions and suspected unexpected serious adverse reactions).

3. Serious adverse reactions (includes suspected unexpected serious adverse reactions)

4. Non-serious adverse events (includes non-serious adverse reactions); numbers, proportions, and examples.

5. Withdrawals from the interventions.

The four components of ‘serious deterioration’ will be reported in addition to the composite outcome.

Protocol:
Adverse outcomes

Adverse outcomes (score of 5–7 of the self-rated CGI) will be monitored by examining the CGI at all follow-up assessment interviews 49]. An adverse outcome will be considered to have occurred if the physical function score of the SF-36 28] has dropped by 20 points from the previous measurement. This deterioration score has been chosen since it represents approximately one standard deviation from the mean baseline scores (between 18 and 27) from previous trials using this measure 23,25]. Furthermore, the RN will enquire regarding specific adverse events at all follow-up assessment interviews.
 
Last edited:

Dolphin

Senior Member
Messages
17,567
Some of the secondary efficacy outcomes were dropped or changed (no reason given here, nor is it highlighted):

Efficacy outcomes are:

1. Participant rated Clinical Global Impression (CGI) [31] change category.

2. Anxiety measured by HADS-A subscale of the Hospital Anxiety and Depression Scale [32].

3. Depression measured by HADS-D subscale of the Hospital Anxiety and Depression Scale [32].

4. Six-Minute Walking Test [33].

5. Work and Social Adjustment measured by WSAS [34].

6. Participant Satisfaction (7-point item from very satisfied to very dissatisfied).

7. Centers for Diseases Control (CDC) Symptoms - Number of symptoms [35].

8. Jenkins sleep score [36].

A selection of the above efficacy outcomes will be reported in the primary paper as required to aid interpretation of the primary outcomes; other secondary outcomes will be reported in subsequent papers. The selection will, in part, be determined by space constraints.

Health economics outcomes are:

1. CSRI (service, societal, NHS and insurance/benefits costs) [37].

2. EuroQol [38].

(I could miss bits)

Secondary outcome measures – Secondary efficacy measures

1. The Chalder Fatigue Questionnaire Likert scoring (0,1,2,3) will be used to compare responses to treatment 27].
[Dropped: Became a primary outcome measure]

2. The self-rated Clinical Global Impression (CGI) change score (range 1 – 7) provides a self-rated global measure of change, and has been used in previous trials 45]. As in previous trials, we will consider scores of 1 or 2 as a positive outcome ("very much better" and "much better") and the rest as non-improvement 23].

3. The CGI change scale will also be rated by the treating therapist at the end of session number 14, and by the SSMC doctor at the 52-week review.

[Therapist bit never reported. SSMC doctor rating used if missing score, as I recall]

4. "Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less 27], (ii) SF-36 physical Function score of 85 or above 47,48], (iii) a CGI score of 1 45], and (iv) the participant no longer meets Oxford criteria for CFS 2], CDC criteria for CFS 1] or the London criteria for ME 40].

[Not mentioned above but has been reported on (with major changes)]

5. The Hospital Anxiety and Depression Scale scores in both anxiety and depression sub-scales38].

6. The Work and Social Adjustment scale provides a more comprehensive measure of participation in occupational and domestic activities 33].

7. The EuroQOL (EQ-5D) provides a global measure of the quality of life 39].

8. The six-minute walking test will give an objective outcome measure of physical capacity 31].

9. The self-paced step test of fitness 43].
[Looks like it was dropped]

10. The Borg Scale of perceived physical exertion 44], to measure effort with exercise and completed immediately after the step test.
[Looks like it was dropped]

11. The Client Service Receipt Inventory (CSRI), adapted for use in CFS/ME 31], will measure hours of employment/study, wages and benefits received, allowing another more objective measure of function.

12. An operationalised Likert scale of the nine CDC symptoms of CFS 1].
[Changed to presence/absence of symptoms]

13. The Physical Symptoms (Physical Health Questionnaire 15 items(PHQ15)) 35].
[Looks like it was dropped]

14. A measurement of participant satisfaction with the trial will also be taken at 52 weeks [53].
 
Last edited:

Dolphin

Senior Member
Messages
17,567
A little description:
Cognitive behavioural therapy + standardised specialist medical care (CBT). CBT is based on the illness model of fear avoidance, used in the three previous trials of CBT [39-41]. There are three essential elements: (a) assessment of illness beliefs and coping strategies; (b) structuring of daily rest, sleep, and activity, with a graduated return to normal activity; and (c) collaborative challenging of unhelpful beliefs about symptoms and activity.
Dutch research shows CBT fails to achieve this (on average).
Also, shows why actometers as an outcome should have been kept.
 

Dolphin

Senior Member
Messages
17,567
Confirmation of what we knew: the principal investigators (PDW, MS, TC) has plenty of ways to get a feel on how things were going.

Table 1 Details of Participating Centres

ID

Clinical Service

Centre Leaders

1 Chronic Fatigue Clinic, St Bartholomew’s Hospital, London

Professor PD White

2 Chronic Fatigue Syndrome Service, Western General and Astley Ainsley Hospitals, NHS Lothian, Scotland

Dr D Wilks, Professor MC Sharpe

3 Chronic Fatigue Research Unit, King’s College Hospital, London

Professor T Chalder, Professor S Wessely

4 Chronic Fatigue Clinic St Bartholomew’s Hospital, London

Dr M Murphy

5 Oxfordshire Mental Healthcare NHS Trust and Oxford Radcliffe Hospitals Trust, Oxford

Dr B Angus, Professor T Peto, Dr E Feldman

6 Fatigue Service Royal Free Hampstead NHS Trust, London

Dr G Murphy

7 Pain Management Centre Frenchay Hospital, Bristol

Dr H O’Dowd

Centres 1 and 4 were combined on 01 June 2006 and are regarded as a single centre for both randomisation and analysis.
 
Last edited:

Dolphin

Senior Member
Messages
17,567
For letter writers:
I notice they use the acronym SF-36PF for SF-36 physical functioning subscale. This is one worth shorter than SF-36 PF.
 
Messages
13,774
I was just reading some of the publication history/reviewer comments (I've been putting off something more useful):

http://www.trialsjournal.com/content/14/1/386/prepub

http://www.trialsjournal.com/imedia/2104564710494985_comment.pdf

Looks like an admission that the weak MCID's were just pulled out of their arses:

49. P 28, 9.2, p 1, l 6. Which test of normality will you use?
R it.

We did not intend to use a test of normality. Instead we planned an examination of the
histograms.

50. P 29, S 9.3, p 2. Is the a R for the MCID?
We had two primary outcomes, neither of which had well validated MCID defined. In specifying 0.5 SD as a clinically useful difference we were guided by (Sloan, Cella, & Hays 2005) and (Cohen 1988) but in reality we chose the values via discussion within the team. As such, there is no reference

I didn't really follow this bit (stats is not a strong point):

1. In Section 2.3 Trial Design regarding “sample size calculations taken from the protocol”, I did not see how the sample size calculation matched with the primary statistical analysis method (mixed-effects linear regressions including participant as a random intercept and investigating adding a random slope on time, as you presented on page 29). How is the clustering effect taken into account?

Clustering was not taken into account in the sample size, only in the analysis. This reflected the change in practice over the course of the trial. At the point when the sample size was calculated there was no consensus in the medical statistics com
munity about allowing for clustering in individually randomised trials. By the time the analysis was undertaken it was
part of the CONSORT statement for non-pharmacological treatment trials. Even so, there was no recommended method for handling clustering in a trial with the multiple membership data structure this trial had. As such this trial is an exemplar and this SAP of interest

Here's a copy of that section from their original submission (not checked if anything changed for final paper (pardon the funny formatting - I fixed a lot!):

http://www.trialsjournal.com/imedia/9579067908006018_manuscript.pdf

9.2

Descriptive Statistics for Outcome Measures

The distributions of the Likert Chalder Fatigue scores will be presented in frequency histograms both overall and by
intervention at each assessment point (baseline, weeks 12, 24, and 52). The distribution of the SF-36 physical function subscale score will also be presented in histograms both overall and by intervention at each assessment point. It is
anticipated that the distributions of the Likert Chalder Fatigue score and the SF-36 physical function subscale score will be approximately normally distributed. Summary statistics (minimum, maximum, mean and standard deviation, median and inter-quartile range) will be tabulated and the response profiles plotted for each continuous score both overall and by
intervention at each assessment point. The response profiles over time will also be plotted by outcome and intervention.
The mean scores (Likert Chalder Fatigue scores and SF-36 physical function subscale scores) within each main
therapist’s caseload will be calculated by therapy (APT, CBT and GET).

These means will be plotted to investigate the level of variability in participant outcomes between therapists and to examine the distribution of these summary statistics (i.e. whether they are normally distributed or skewed). Differences in the mean scores within each main doctor’s caseload will also be calculated and similar plots based on these presented.

9.3

Primary Analysis (including method of analysis) The primary analysis addressing primary objectives (1) to (5) and secondary objectives (1) and (3) will be based on the principle of intention-to-treat. If missing data are estimated using multiple imputation this analysis will be based on the intention-to-treat sample (section 4.2); if missing data are estimated
via prorating and maximum likelihood, the analysis will be based on the available-case sample (section 4.2) and will exclude any participants with no follow-up data in a “modified ITT” analysis. The primary outcomes of fatigue and physical
disability will be analysed separately using two mixed-effects linear regressions, each including participant as a random
intercept and investigating adding a random slope on time. Time (investigating the possibility of linearising across 12, 24
and 52 weeks), the time-by-intervention interaction, baseline CFQ Likert score, baseline SF-36 physical function score and
the design factors (i.e. centre, CDC criteria, London criteria and current depressive disorder will be included as fixed effects. Primary interest will be in the fixed contrasts specified in section 6.5 at 52 weeks. The statistical models used in the analysis will be reported in full.

Clinical importance of the mean differences in primary outcomes at 52 weeks. This will be judged by reference to the
trial sample SDs at baseline in this trial supported by estimates from other sources. Specifically, a difference between means of two intervention groups, at 52 weeks, of 0.3 SD will be regarded as of minimal clinical importance (a MCID)
and of 0.5 SD as a clinically useful difference. From published literature on these scales these differences can be translated into 5 points on the SF-36PF, and 1.2 points on the CFQ, for minimal clinical importance and 8 points on the SF-36PF, and 2.0 points on the CFQ, for clinically useful.

Has that info been released? I don't remember histograms for SF36-PF and Chalder Fatigue outcomes?


I wonder if this data would have been of interest:

2. Following my previous question, the sample size calculation was based on the primary outcome/analysis. I am thinking whether it is helpful to provide what statistical power can be achieved for analyzing the secondary outcomes and economic outcomes based on this sample size.

The sample size calculation reported was used to design the trial. We did not carry out power calculations for the secondary and economic outcomes and doing so now would be post-hoc and of limited value as a result. The confidence intervals reported in the principal papers provide estimates of the precision of the results
 

Dolphin

Senior Member
Messages
17,567
Thanks for that, Esther12.

Has that info been released? I don't remember histograms for SF36-PF and Chalder Fatigue outcomes?
I don't recall seeing any histograms of them.

I wonder if this data would have been of interest:
2. Following my previous question, the sample size calculation was based on the primary outcome/analysis. I am thinking whether it is helpful to provide what statistical power can be achieved for analyzing the secondary outcomes and economic outcomes based on this sample size.

The sample size calculation reported was used to design the trial. We did not carry out power calculations for the secondary and economic outcomes and doing so now would be post-hoc and of limited value as a result. The confidence intervals reported in the principal papers provide estimates of the precision of the results
(Possibly me being overly awkward) I find it slightly interesting to reflect on the fact that they justified having such a big trial, which cost a lot of money (£5 m), including close to £1m (I think it was) extra for an extension to get more participants, based on power calculations which then were no longer relevant as they changed how they analysed the trial (using continuous rather than categorical measures).

Put another way, they might have been able to do what they did in a way that cost £1 or £2 million less.
 

Dolphin

Senior Member
Messages
17,567
Health economics objectives

[..]

The secondary hypotheses are:

1. Higher healthcare costs are associated with being female, being older and having comorbid conditions, particularly mood disorders and having other symptom-based diagnoses.

2. Higher total societal costs are associated with being male, being younger, having more severe physical disability, pervasive passivity (measured by actigraphy), certain illness beliefs, and having comorbid conditions, particularly mood disorders and having other symptom-based diagnoses.
None of this was reported despite the paper on the health economics http://www.plosone.org/article/info:doi/10.1371/journal.pone.0040808 being published in PLOS One

PLOS ONE considers manuscripts of any length. There are no explicit restrictions for the number of words, figures, or the length of the supporting information, although we encourage a concise and accessible writing style.
 

Dolphin

Senior Member
Messages
17,567
Minor point probably

If one looks at:
Table 3 Derivation of secondary outcomes
one can see it mentions summary data e.g.
Participant-rated CGI
Positive change; no change; negative change
(there are seven options on the CGI; this is a collapsed form i.e. 1&2; 3-5; 6&7

CDC Symptoms (#)
Total (sum) of CDC symptoms 1 to 8
(people rated these individually on scales that weren't yes/no)

However, for:
Participant satisfaction
they aren't in a collapsed/summary form:
Very satisfied; moderately satisfied; slightly satisfied; neither; slightly dissatisfied; moderately dissatisfied; very dissatisfied

However, in the paper, they did collapse them:

At 52 weeks, participants rated satisfaction with treatment received on a 7-point scale, condensed into three categories to aid interpretation (satisfied, neutral, or dissatisfied).
 

Dolphin

Senior Member
Messages
17,567
Blinding of randomised interventions

[..]

The steps taken to minimise and measure bias were:

[..]

3. Equipoise was actively encouraged throughout the planning and course of the trial.

I'm not an expert on equipoise but am not sure in this case

An ethical dilemma arises in a clinical trial when the investigator(s) begin to believe that the treatment or intervention administered in one arm of the trial is significantly outperforming the other arms. A trial should begin with a null hypothesis, and there should exist no decisive evidence that the intervention or drug being tested will be superior to existing treatments or effective at all.
I'm not convinced this was the position of the investigators.

For example, why is APT treated differently from CBT and GET in the analysis e.g. APT is compared to CBT and APT is compared to GET but CBT isn't compared to GET generally.

The trial protocol had:

Assumptions
The existing evidence does not allow precise estimates of improvement with the trial treatments. However the available data suggests that at one year follow up, 50 to 63% of participants with CFS/ME had a positive outcome, by intention to treat, in the three RCTs of rehabilitative CBT18,25,26], with 69% improved after an educational rehabilitation that closely resembled CBT 43]. This compares to 18 and 63% improved in the two RCTs of GET 23,24], and 47% improvement in a clinical audit of GET 56]. Having usual rather than specialist medical care allowed 6% to 17% to improve by one year in two RCTs 18,25]. There are no previous RCTs of APT to guide us 11,12], but we estimate that APT will be at least as effective as the control treatments of relaxation and flexibility used in previous RCTs, with 26% to 27% improved on primary outcomes 23,26]. We propose that a clinically important difference would be between 2 and 3 times the improvement rate of SSMC.

If I recall correctly, there is material in the manuals that promotes CBT and GET.

-------------------------------------------------------------------

4. Baseline staff expectations regarding the outcome of the trial were recorded.
It's not much good recording expectations if you don't publish them.

They also said:
Beliefs and expectations of treatment and who is running the trial

The trial has been designed and is being managed by many different healthcare and research professionals, including doctors, therapists, health economists, statisticians and a representative of a patient charity. The Trial Management Group includes five physicians and four psychiatrists. To measure any bias consequent upon individual expectations, all staff involved in the PACE trial recorded their expectations as to which intervention would be most efficacious before their participation, and we will publish these data after the end of the trial.

http://www.biomedcentral.com/1471-2377/7/6/comments#306608

Perhaps it might be worthwhile somebody requesting this data.
 
Last edited:

Dolphin

Senior Member
Messages
17,567
Withdrawals from intervention

The decision to withdraw a participant from an intervention is made by the clinician or the participant (active withdrawals). The number of active withdrawals (broken down by initiator (participant, clinical staff, both)) will be reported by intervention and centre, and by interval from randomisation. The most common reasons for withdrawal will be summarised.
I can't see any place where this is given (I've also checked the appendix of the Lancet paper).

If I recall correctly, the CONSORT trial profile (Figure 1) often/usually has such data (indeed, I think CONSORT encourages it?), but all we get is "withdrew" and the numbers.
 

Dolphin

Senior Member
Messages
17,567
I think this one might be reasonably important:

Multiplicity adjustments will be made as follows:

1. The following five comparisons will be made using two-sided hypothesis tests (alpha = 0.05) at 52 weeks: APT versus SSMC, CBT versus SSMC, GET versus SSMC, CBT versus APT, GET versus APT. For the co-primary outcomes, fatigue and disability, and for the secondary outcome, the participant-rated CGI, P-values will be presented unadjusted for multiplicity.

2. In addition Bonferroni adjustment (0.05/5) will be applied separately to each of the three outcomes to control the outcome-wise type I error rate at 5%.

If one looks at the Lancet paper, it appears they have only done it for fatigue and disability and not participant-rated CGI.

They repeat in point 4 that CGI is supposed to be adjusted for:
4. No adjustment will be made within the principal paper(s) for other analyses including those for safety, secondary outcomes (except the CGI) [26], and health economics.

One can see them doing it for fatigue and disability in Table 3. All apart from one means the numbers are multiplied by 5 (the odd one out is a p value of .38 which then becomes .99. p-values can only be between 0 and 1 so this makes sense.

However, Table 5 has: Participant-rated clinical global impression of change in overall health
Odds ratio (positive change vs negative or minimum changes)
Compared with specialist medical care

APT: 1·3 (0·8–2·1); p=0·31

CBT: 2·2 (1·2–3·9); p=0·011

GET: 2·0 (1·2–3·5); p=0·013

Compared with adaptive pacing therapy

CBT: 1·7 (1·0–2·7); p=0·034

GET: 1·5 (1·0–2·3); p=0·028

I think both of the CBT and both of the GET results would no longer be statistically significant with bonferroni adjustment (basically multiplying by 5 at that level)
 

Dolphin

Senior Member
Messages
17,567
Interpretation will be done as indicated below:

1. Marginal interpretation of the results will be of primary interest and will be based on the size and precision of the observed differences between interventions with reference to point estimates and unadjusted 95% CIs.

2. Intervention recommendations will also take into consideration the consistency of effects

a. across any supportive intervention contrasts,
b. across sensitivity analyses, primary outcomes and time points,
c. across efficacy, safety and cost analyses,
and
d. with the results of previous studies, and clinical and consumer opinions.
I wonder what they meant/had in mind with regard to consumer opinions. They didn't show much interest or empathy with opinions expressed on the papers.

The consistency of effects points is interesting. If one looks at objective measures, the results are fairly consistent for CBT with no benefit over APT or SMC. GET did improve with regard to the 6 minute walking test, but not dramatically.
 

Dolphin

Senior Member
Messages
17,567
Presentation will occur as follows:
1. All analyses undertaken will be reported as far as practical (regardless of statistical significance) [65].
2. Estimated effects will be presented with unadjusted 2-sided 95% CIs and P-values.
3. P-values adjusted for multiplicity will also be presented and explained.

Quite a lot of analyses have not been reported, as highlighted. There was quite a lot of opportunities to do it e.g. the Lancet paper had an Appendix, no word limit for Plos One paper, the trial had its own website, etc.
 

Dolphin

Senior Member
Messages
17,567
Probably not that important:
The timing of baseline and follow-up data will be summarised overall and by intervention for each assessment visit in terms of the median (lower quartile, upper quartile, minimum and maximum) number of days from randomisation and the proportion falling outside guideline timeframes.
I don't think this was done.
 

Dolphin

Senior Member
Messages
17,567
I was just thinking a bit more about the lack of the histograms:

Descriptive statistics for outcome measures

The distributions of the Likert Chalder fatigue scores will be presented in frequency histograms both overall and by intervention at each assessment point (baseline, weeks 12, 24, and 52). The distribution of the SF-36 physical function subscale score will also be presented in histograms both overall and by intervention at each assessment point. It is anticipated that the distributions of the Likert Chalder fatigue score and the SF-36 physical function subscale score will be approximately normally distributed. Summary statistics (minimum, maximum, mean and standard deviation, median and inter-quartile range) will be tabulated and the response profiles plotted for each continuous score both overall and by intervention at each assessment point. The response profiles over time will also be plotted by outcome and intervention.

By not publishing histograms, it enabled them to make claims about "return to normal" that others couldn't easily see were dubious.

Similarly with regard to recovery, histograms would likely show that recovery (at 85/90/95/100 in the SF-36 PF, for example) was not that common.

These might be good things to do a Freedom of Information Act request on. These should already have been created and thus there should be no work involved in anyone preparing them.
 

Dolphin

Senior Member
Messages
17,567
Given the level of missing data for the six minute walking test, it is disappointing no information has been given on this including analyses to see whether the data was missing randomly or whether the group without data were different in some way to the group that completed the test.

I think the level of missing data is much more common for the 6 minute walking test compared to the other measurements, although I don't imagine pro-rating was done.

Description of missing data

Where available, the reasons for missing baseline and follow-up data will be summarised overall and by intervention at the visit and scale levels. This will be done using relevant information included in the comments fields of the database. It is anticipated that such information will be available principally for visit and scale missing data.

Where the level of item-missing data is borderline between ‘minimal’ and ‘important’ (see Methods for Handling Dropouts and Missing Data), the appropriateness of prorating will be evaluated using the checks outlined by Fayers et al. [56]. Assumptions regarding the nature of the missing data mechanism (that is, MAR as compared to MCAR and MAR, conditional on the variables included in the substantive model as compared to additional variables) will be evaluated by looking descriptively at the statistical associations between whether or not data is missing and any potential predictors, including those generated by looking at the comments fields or the data.
 

Dolphin

Senior Member
Messages
17,567
Comparison of losses to follow-up
Losses to follow-up will be reported at 13, 26, and 52 weeks by intervention and centre. Narrative summaries will be given of the reasons when known.
(i) Losses to follow-up weren't reported by centre, as far as I know.

(ii) No narrative summaries of the reasons were given.