• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial statistical analysis plan

biophile

Places I'd rather be.
Messages
8,977
A randomised trial of adaptive pacing therapy, cognitive behaviour therapy, graded exercise, and specialist medical care for chronic fatigue syndrome (PACE): statistical analysis plan.

Walwyn R, Potts L, McCrone P, Johnson AL, Decesare JC, Baber H, Goldsmith K, Sharpe M, Chalder T, White PD.

Trials. 2013 Nov 13;14(1):386. [Epub ahead of print]

doi:10.1186/1745-6215-14-386

Abstract (provisional)

BACKGROUND: The publication of protocols by medical journals is increasingly becoming an accepted means for promoting good quality research and maximising transparency. Recently, Finfer and Bellomo have suggested the publication of statistical analysis plans (SAPs).The aim of this paper is to make public and to report in detail the planned analyses that were approved by the Trial Steering Committee in May 2010 for the principal papers of the PACE (Pacing, graded Activity, and Cognitive behaviour therapy: a randomised Evaluation) trial, a treatment trial for chronic fatigue syndrome. It illustrates planned analyses of a complex intervention trial that allows for the impact of clustering by care providers, where multiple care-providers are present for each patient in some but not all arms of the trial.

RESULTS: The trial design, objectives and data collection are reported. Considerations relating to blinding, samples, adherence to the protocol, stratification, centre and other clustering effects, missing data, multiplicity and compliance are described. Descriptive, interim and final analyses of the primary and secondary outcomes are then outlined.

CONCLUSIONS: This SAP maximises transparency, providing a record of all planned analyses, and it may be a resource for those who are developing SAPs, acting as an illustrative example for teaching and methodological research. It is not the sum of the statistical analysis sections of the principal papers, being completed well before individual papers were drafted.Trial registration: ISRCTN54285094 assigned 22 May 2003; First participant was randomised on 18 March 2005.

PMID: 24225069 ( http://www.ncbi.nlm.nih.gov/pubmed/24225069 )

http://www.trialsjournal.com/content/14/1/386/abstract

(full text) http://www.trialsjournal.com/content/pdf/1745-6215-14-386.pdf
 
Last edited:

biophile

Places I'd rather be.
Messages
8,977
This is 45 pages long, so I will probably not have the opportunity to review it fully anytime soon and I am in no urgent rush to do so, but I did have a very brief search to see if they explain themselves further about some of the most controversial changes. Despite talk of "maximising transparency" and this following statement in the 2011 Lancet paper ...

"The statistical analysis plan was finalised, including changes to the original protocol, and was approved by the trial steering committee and the data monitoring and ethics committee before outcome data were examined."

I could not find any details about the normal range in fatigue and physical function in the statistical analysis plan, nor on the revised recovery criteria. Wasn't the statistical analysis plan supposed to explain everything? Can anyone else find it? Maybe it really was post-hoc in the sense that it was tacked on after even the 'final' changes were made?

As many of you already know, the normal range, in physical function in particular, was derived using questionable methods on inappropriate dataset(s), and the justification for the change was based on a schoolboy error in interpreting statistical data (i.e. confusing the mean score for the median score, which is inexcusable when considering the amount of professional scrutiny that PACE supposedly received before approval and publication).

I was hoping to see further details in the statistical analysis plan about this, but I guess now we will have to wait until their blunder is publicly exposed enough that PACE are forced to explain themselves in writing.

For those of you who have criticized the rather small thresholds for clinical improvement, which were lowered dramatically from the original protocol, I did find this gem though:

Clinical importance of the mean differences in primary outcomes at 52 weeks

This will be judged by reference to the trial sample SDs at baseline in this trial supported by estimates from other sources. Specifically, a difference between means of two intervention groups, at 52 weeks, of 0.3 SD will be regarded as of minimal clinical importance (a MCID) and of 0.5 SD as a clinically useful difference. From published literature on these scales these differences can be translated into 5 points on the SF-36PF, and 1.2 points on the CFQ, for minimal clinical importance and 8 points on the SF-36PF, and 2.0 points on the CFQ, for clinically useful.

Based on PACE data too, the MCID for fatigue would be 1 single point out of 33! (3.8*0.3=1.14). [Edit: Perhaps it would be rounded up to 2, since it is not possible to score 1.14, and 1 would not technically meet MCID]. Also note that when the range is 0-100, the SF-36/PF scale is in 5 point increments, so 5 points for MCID is effectively a single point out of 20, the smallest change possible, just as 1 point is for the CFQ Likert scoring (0-33), a far cry from the original goalposts. Excluding the severely affected but having strict cut off points had the effect of lowering the baseline SD.

And this for those who are interested in the distribution of the scores:

Descriptive statistics for outcome measures

The distributions of the Likert Chalder fatigue scores will be presented in frequency histograms both overall and by intervention at each assessment point (baseline, weeks 12, 24, and 52). The distribution of the SF-36 physical function subscale score will also be presented in histograms both overall and by intervention at each assessment point. It is anticipated that the distributions of the Likert Chalder fatigue score and the SF-36 physical function subscale score will be approximately normally distributed. Summary statistics (minimum, maximum, mean and standard deviation, median and inter-quartile range) will be tabulated and the response profiles plotted for each continuous score both overall and by intervention at each assessment point. The response profiles over time will also be plotted by outcome and intervention.
 
Last edited:

Dolphin

Senior Member
Messages
17,567
This journal allows online comments. The threshold for online comments is a lot lower than published letters so hopefully some people will discuss some of the issues as comments.
 

Dolphin

Senior Member
Messages
17,567
Thanks biophile

Based on PACE data too, the MCID for fatigue would be 1 single point out of 33! (3.8*0.3=1.14). Also note that when the range is 0-100, the SF-36/PF scale is in 5 point increments, so 5 points for MCID is effectively a single point out of 20, the smallest change possible, just as 1 point is for the CFQ Likert scoring (0-33), a far cry from the original goalposts. Excluding the severely affected but having strict cut off points had the effect of lowering the baseline SD.
Small point: I'm not an expert on MCIDs and the like but there is a chance one might not round down i.e. the MCID might be >=1.14 which is 2. Not sure on this as I say.
 

Sea

Senior Member
Messages
1,286
Location
NSW Australia
So an improvement of 8 points on the SF-36 is clinically useful, but a serious deterioration is defined as a decline of at least 20 points at two consecutive assessments

"Safety outcomes are:

1. Serious deterioration (primary) defined as one or more of the following up to 52 weeks:

  1. SF-36 physical function score diminishing by 20 or more points between baseline and

    any two consecutive assessment interviews."
 
Last edited:

anciendaze

Senior Member
Messages
1,841
Simple observation: moving the threshold by 5 points is not a significant change because the threshold is an arbitrary number; crossing the threshold by 5 points still constitutes recovery, an important goal of this research. Of course using original recovery criteria on those who entered the trial at the altered threshold would be wrong. Requests for data allowing anyone else to check that this did not happen are "vexatious".

Anyone see an internal contradiction in this argument?

I think part of the criticism of the actual numbers is irrelevant. The important point is the interpretation casual professional readers are likely to place on PACE data. If the population distribution were normal, being 1 SD from the mean would not be a serious problem. This could also be applied to patients with heart failure or COPD who fall in this same range. The only difference here is that "we know" these people have real medical problems, while "we know" ME/CFS patients have somatoform mental disorders.

Likewise, ignoring age in comparing sample and population leads to an implication I have not seen mentioned. If we only go by medical judgment of physical condition there is no reason a 70 year-old in good physical condition should be receiving a pension. His/her performance on objective measures of physical function like that 6 minute walk could well be twice the score of many ME/CFS patients.

The important confusion was in the study from the outset, ignoring the difference between the healthy and the ill, the young and the aged. Numbers are far less important here than perceptions. The extent to which those responsible for the trial have defended methods, maintained claims, and even extended them to define recovery, without clarifying important points tends to imply that deception was deliberate.

To the extent that the trial was rigorous it was virtually meaningless. Moving goalposts by 5 points also implies that simply crossing such an arbitrary boundary does not constitute meaningful recovery. No stronger criteria have been suggested, nor has evidence been presented which might show the trial met these. Public relations and professional perceptions seem to have been the main target. Scientific methods were used to add a patina of objectivity to a fundamentally subjective exercise in which those running the trial tried hard to transfer their own biases to patients, largely failing in the process.

While organizers may be prevented from disclosing private medical information about patients, the patients themselves remain free to report their participation in the trial, if they so choose. With hundreds involved we might expect to hear testimonials about "how PACE returned me to normal life".

Am I missing something?
 

user9876

Senior Member
Messages
4,556
The distributions of the Likert Chalder fatigue scores will be presented in frequency
histograms both overall and by intervention at each assessment point (baseline, weeks 12, 24, and 52).

I'm pretty sure that this hasn't been done. They also say the expect them to be normal which I think has an interesting implication that they expect all patients to act in the same way. I would have thought that having a treatment effective/treatment uneffective group would lead to a bimodal distribution.

I've only skimmed the document but they don't seem to justify the use of the 'likert' scoring on the CFQ. The paper they quote tried to validate the bimodal scoring which suggests to me that there has been no attempt to validate the scale using the 'likert' scoring. Its also worth noting that in the paper they quote they say something along the lines of the need to quote mental and physical fatigue separately due to the variance being in 2 major principle components.
 
Messages
13,774
So an improvement of 8 points on the SF-36 is clinically useful, but a serious deterioration is defined as a decline of at least 20 points at two consecutive assessments

"Safety outcomes are:

1. Serious deterioration (primary) defined as one or more of the following up to 52 weeks:

  1. SF-36 physical function score diminishing by 20 or more points between baseline and

    any two consecutive assessment interviews."

That stood out to me too.

No data on 'serious' improvement, or 'clinically important' deterioration either.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
The paper said:
The distributions of the Likert Chalder fatigue scores will be presented in frequencyhistograms both overall and by intervention at each assessment point (baseline, weeks 12, 24, and 52).
I'm pretty sure that this hasn't been done. They also say the expect them to be normal which I think has an interesting implication that they expect all patients to act in the same way. I would have thought that having a treatment effective/treatment uneffective group would lead to a bimodal distribution
They said the would present more than that, and for secondary as well as primary outcomes:
Descriptive statistics for outcome measures
... The distribution of the SF-36 physical function subscale score will also be presented in histograms both overall and by intervention at each assessment point. It is anticipated that the distributions of the Likert Chalder fatigue score and the SF-36 physical function subscale score will be approximately normally distributed. Summary statistics (minimum, maximum, mean and standard deviation, median and inter-quartile range) will be tabulated and the response profiles plotted for each continuous score both overall and by intervention at each assessment point. The response profiles over time will also be plotted by outcome and intervention...

The distributions of all secondary efficacy outcomes will be presented in histograms (continuous/count) or bar charts (ordinal/binary) both overall and by intervention at each assessment point...

Summary statistics will be further plotted using line graphs for each outcome across time by intervention.

However, this is a stats analysis plan, and as the first step of any stats analysis is to plot out the data they may simply mean 'present' as in produce the plots for internal use as part of the analysis, as opposed to 'publish'. If so, I would have thought that was the just kind of information that would be swiftly released in response to an FOI request, if anyone was interested in doing so.
 
Messages
15,786
They said the would present more than that, and for secondary as well as primary outcomes:

However, this is a stats analysis plan, and as the first step of any stats analysis is to plot out the data they may simply mean 'present' as in produce the plots for internal use as part of the analysis, as opposed to 'publish'. If so, I would have thought that was the just kind of information that would be swiftly released in response to an FOI request, if anyone was interested in doing so.
One criticism from the instructor of the Coursera statistics course is for when studies don't include histograms. A lot of false trends get outed just by looking at the distribution of the pretty dots.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
Basic commentary on the Stats Analysis Plan

Mmm, this was a good read. Thought I'd post up a few of my notes on this.

The good
First off, they published the plan, which is rare in clinical trials. Second, they did some very sophisticated analysis to take account of variables that might affect results. So when looking at the differences between the different groups they adjusted for other factors including treatment centre, criteria (Oxford, CDC and 'London') as well as adjusting for the baseline score (ie taking in to account how disabled/fatigued someone was at the start of the trial):
The primary analysis of therapy effect will be adjusted by the factors used for stratification at randomisation (that is, centre, CDC criteria, London criteria and current depressive disorder) [12,49] and by the baseline assessment of the outcome variable.
I won't go into the detail (because I don't understand it all :)) but here's an example of the kind of fancy stuff they did:
Method for handling other clustering effects
Outcomes at weeks 12, 24 and 52 are nested within participants. The primary method for
handling clustering associated with repeated measurements will be to fit a cluster-specific
random effects model [51-53] including the participant as a random intercept, and
investigating the addition of a random slope over time
. Where therapy effects cannot be
interpreted as population-averaged effects because outcomes are binary, a population-average
(GEE) model will also be fitted.
You get the idea.

Transparency issues
The paper makes many excellent points about the value of transparency:
Its publication enables any changes to the original plan to be laid out, increasing the scientific rigour and transparency with which the principal analyses are currently reported. Maximum transparency regarding what decisions were made a priori could be achieved by publishing the statistical analysis plan, which has been approved by the Trial Steering Committee (TSC), before the results of a study are known.

Assessment of the validity of the analyses, reporting and consequent interpretation would also be made easier by the increased visibility of selective or misreporting. This may, in turn, encourage more balanced, accurate and complete reporting of results and ultimately help to raise the standard of trial analyses. Peer review has particular advantages, as it encourages dialogue, the quality of which is likely to be improved by the level of detail given. Knowledge of this added scrutiny should, in turn, act to promote the quality of the submitted plan. This process would be especially valuable if the research is anticipated to generate debate or if it might have a large impact on clinical practice.
They neglect to mention that most of these noble ideals were not met in this case as the plan was published 3.5 years after it was agreed and 2.5 years after the main paper it relates to came out.

'Recovery' statistical analysis not included
Unfortunately, the highly-controversial PACE redefinition of Recovery was not covered by this plan, and instead seems to fall under 'exploratory' analysis: defining Recovery should never be exploratory; it really isn't that difficult and you shouldn't have to look at your results before you can know what 'recovery' is:
The SAP supplements the published protocol [13], the main clinical [14] and health economics [15] papers and the authors’ reply [16] to a selection of correspondence published by the Lancet [17-24]...
[Note, not the Recovery paper]

It is intended that the results reported in these papers will follow the strategy set out here; subsequent papers of a more exploratory nature will not be bound by this strategy but will be expected to follow the broad principles laid down for the principal papers.

Think I'll take a break at this stage.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
Each post even more exciting than the last! Brace yourselves...

Another good thing laid out in the stats plan was the intention to check that the changes seen in the primary outcomes of fatigue and self-rated function were matched by changes in secondary outcomes such as depression and the 6-minute walking test.

Secondary objectives:
1. Is the pattern of results relating to the primary objectives replicated with the outcome as the participants’ self-rated clinical global impression change rating? ...
3. Are the differences across interventions in the primary outcomes associated with similar differences in secondary outcomes?

Efficacy outcomes are:
1. Participant rated Clinical Global Impression (CGI) [31] change category.
2. Anxiety measured by HADS-A subscale of the Hospital Anxiety and Depression Scale [32].
3. Depression measured by HADS-D subscale of the Hospital Anxiety and Depression Scale [32].
4. Six-Minute Walking Test [33].
5. Work and Social Adjustment measured by WSAS [34].
6. Participant Satisfaction (7-point item from very satisfied to very dissatisfied).
7. Centers for Diseases Control (CDC) Symptoms - Number of symptoms [35].
8. Jenkins sleep score [36].​
A selection of the above efficacy outcomes will be reported in the primary paper as required to aid interpretation of the primary outcomes;

The plan says all of these secondary outcomes will be run through the main analysis used for the primary outcomes (tweaked as necessary due to different types of data - continuous or categorical).
The intervention and time-by-intervention contrasts fitted for the primary outcomes will be extracted for each secondary efficacy outcome as outlined in the analyses of the primary outcomes.
Which is very thorough. Except they don't seem to have done it.

Going back to the objective above:
3. Are the differences across interventions in the primary outcomes associated with similar differences in secondary outcomes?
We know that for the sole objective outcome reported, the 6-minute Walking Test, the differece were not the same: CBT showed a 'moderate' gain in fatigue and function but no gain in walking distance relative to the control group. And GET showed a moderate gain in fatigue/function but only a small gain in walking distance.

Given that it was a stated objective to look at whether or not each of the secondary outcomes backed up the primary outcomes, I'm surprised it doesn't seem to have been reported in any of the papers. And they are not just talking about a quick look at the basic figures (as I did above) but detailed statistical models. Where are they? I can only hope someone will show me what I've missed.
 

Dolphin

Senior Member
Messages
17,567
What a lot of people in their published and unpublished letters (and comments in other fora) have done is contrasted some of the secondary results with the primary outcome measures (and figures for recovery, etc.). And we have been met with quite a bit of scorn for doing something they seemed to have planned to do.

It is frustrating they didn't publish the statistical plan on time. As they say themselves (as Simon highlighted):
Maximum transparency regarding what decisions were made a priori could be achieved by publishing the statistical analysis plan, which has been approved by the Trial Steering Committee (TSC), before the results of a study are known.
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
Could the disappointing overall results have been known, at least by statisticians, before unblinding?
No analyses of outcomes relating to this strategy have been, or will be, conducted prior to final written approval of the analysis strategy by the TSC. Reports have been prepared with data presented descriptively by intervention (coded to maintain blinding) for the closed sessions of the Data Monitoring Committee. Consequently, both DMC and TSC were blind to intervention group, as were the trial statisticians. Data cleaning will be performed as blind to intervention allocation as possible.
The PACE Trial protocol assumed large effects for CBT & GET on outcomes, and even results coded to maintain blinding would clearly show that NO groups had done very well and that the trial wasn't hitting its original targets. So if these reports for the DMC were prepared before the stats plan was finalised, as this implies - and if the reports included outcomes (which I assume they did) then descriptive stats, which include means, would show the trial was struggling. That knowledge could potentially have influenced the decision to change how primary outcomes were reported.

This really hangs on a) preparation of blinded reports before the stats plan was finalise and b) outcomes being included in the blinded reports (blinded reports simply don't show which group is which).
 
Messages
13,774
Could the disappointing overall results have been known, at least by statisticians, before unblinding?

The PACE Trial protocol assumed large effects for CBT & GET on outcomes, and even results coded to maintain blinding would clearly show that NO groups had done very well and that the trial wasn't hitting its original targets. So if these reports for the DMC were prepared before the stats plan was finalised, as this implies - and if the reports included outcomes (which I assume they did) then descriptive stats, which include means, would show the trial was struggling. That knowledge could potentially have influenced the decision to change how primary outcomes were reported.

This really hangs on a) preparation of blinded reports before the stats plan was finalise and b) outcomes being included in the blinded reports (blinded reports simply don't show which group is which).

This may have had an impact upon the way in which new outcome measures were developed, but it was non-blinded treatment - anyone with any contact with patients would have known that CBT/GET were not resulting in the improvements claimed for them, even if they did not have access to the unblinded data.
 

biophile

Places I'd rather be.
Messages
8,977
Good work Simon! All their talk of timely transparency and scrutiny and comparing outcome measures was amusing. Were the TSC at the local pub when PACE came up the 'normal range' and 'recovery'? Their justification for changing the threshold of normal/recovered has already been exposed as dodgy, and so has some of their other claims, but now it also appears that their statistical analysis plan contradicts what they have done and claimed in later papers too.

One criticism from the instructor of the Coursera statistics course is for when studies don't include histograms. A lot of false trends get outed just by looking at the distribution of the pretty dots.

Indeed. I will bet everything I own that the so-called "recovered" participants' scores in physical function have a clearly different distribution compared to healthy age matched controls.
 
Last edited:

Dolphin

Senior Member
Messages
17,567
The rationale for the trial is outlined in the protocol [13] and main clinical paper [14]. To be brief, chronic fatigue syndrome is characterised by chronic disabling fatigue in the absence of an alternative diagnosis, present in 0.2 to 2.6% of the population. The National Institute for Health and Clinical Excellence (NICE, UK) recommends two treatments: cognitive behaviour therapy (CBT) and graded exercise therapy (GET), but patient organisations recommend a third treatment: adaptive pacing therapy (APT). A definitive randomised trial was therefore needed to compare all three treatments with specialist medical care (SSMC) and to compare the established treatments (CBT, GET) against the new treatment (APT).
Given the trial set out to be a definitive trial, all the more reason to pour over it, and not "move on" as Sean Lynch calls us to.
 

Dolphin

Senior Member
Messages
17,567
The secondary health economics objectives are as indicated below:

1. To compare care costs (including the costs falling to health service agencies, other agencies and also those borne by patients and their carers) and lost-employment costs between randomisation and 24 weeks for (i) CBT versus APT; (ii) GET versus APT; (iii) SSMC versus APT; (iv) CBT versus SSMC; (v) GET versus SSMC; and (vi) CBT versus GET.

2. To assess the relative cost-effectiveness and cost-utility of APT, CBT, GET, and SSMC (with costs based on health, social care, and informal care) up to 24 weeks.

3. To describe the annual healthcare and societal costs at baseline and their association with clinical and demographic characteristics.

4. To describe and compare patterns of service utilisation up to 24 weeks and up to 52 weeks across the four interventions.

5. To identify patient characteristics which predict service costs for each intervention.

6. To identify patient characteristics which predict cost-effectiveness/cost-utility up to 24 weeks, and up to 52 weeks for each intervention.
Looking back quickly at McCrone et al. (2012) http://www.plosone.org/article/info:doi/10.1371/journal.pone.0040808 , I think virtually none of this was done.

There doesn't seem to be much/anything reported for 24 weeks.

Nor do I see anything on patient characteristics etc. (i.e. 3, 5 & 6).