PACE Trial and PACE Trial Protocol

Bob

Senior Member
Messages
16,455
Location
England (south coast)
You can take the SF-36 here, nb in the UK version a "block" is a hundred yards. More info on the scale. Nb it measures physical activity (/disability), not fatigue. It's been widely used in ME research by researchers of all persuasions.

The question on moderate acitvity is bizarre: vacuuming and a round of golf in the same statement? but other Qs I think are fair enough.

The biggest issue with the SF36 is that it is a subjective measure, ie it's what people report they can do, which might not be quite the same as what they can do and could be influenced by a desire to please researchers and/or a strong relationship with their therapist, or even an over-optimistic viewof what they can do as a result of 'successful' CBT.

Thanks for that, ocean... I remember now that Cort mentioned this scale in his article... It seems like a ridiculously inappropriate scale to measure ME patients... Thanks for the info.
 

anciendaze

Senior Member
Messages
1,841
Afraid I will have to bow out of this discussion temporarily. What's going on a CROI needs urgent attention, and, if that weren't enough, I am having a flareup of symptoms of peripheral neuropathy (burning feet).

My own dealings with doctors will demand careful attention. I've just realized what I was told about Lyme disease being ruled out was just plain wrong. There are hundreds of well-confirmed cases in this state who did not travel to a region where it is prevalent.
 

Hope123

Senior Member
Messages
1,266
OK, I haven't read through all these pages but re: SF-36, here is a short piece on how it is scored:

http://www.chiro.org/LINKS/OUTCOME/How_to_score_the_SF-36.pdf

In addtion, is there a statistician in the house? Can someone comment on the computer-generated allocation and the linear regression models? My concern has been whether subgroup analysis by different ME/CFS criteria are fair with the methods given.
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
OK, I haven't read through all these pages but re: SF-36, here is a short piece on how it is scored:

http://www.chiro.org/LINKS/OUTCOME/How_to_score_the_SF-36.pdf

In addtion, is there a statistician in the house? Can someone comment on the computer-generated allocation and the linear regression models? My concern has been whether subgroup analysis by different ME/CFS criteria are fair with the methods given.

It's probably not exactly what you are looking for, but there is a very short discussion about a certain aspect of the statistics in the PACE trial, here:
http://forums.aboutmecfs.org/showth...the-PACE-trial-with-a-random-number-generator
(I've got no understanding of statistics at all, so I've no idea if it answers any of your questions!)
 

oceanblue

Guest
Messages
1,383
Location
UK
That recovery threshold in a picuture

sf36pic.gif.gif

sorry it's so small (click to enlarge slightly).

nb for the working age population that would be the relevant comparison for PACE, the 'tail' would be even smaller (and of course the tail is made up of the sick).

This is taken from page 9 of the open access article (notations added by me): Bowling SF-36 normative data

Although the article and picture are freely available there may be copyright issues so please don't reproduce this pic.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
'Nuggets' from the PACE trial full paper and webappendix.


These are my interpretations and I suggest you check against the papers to determine for yourself.

For brevity I have not included the details. I understand the papers are now available in the library.

Chalder Fatigue Questionnaire


One of the primary outcome measures if fatigue as measured by the Chalder fatigue questionnaire (CFQ). The paper states that entry criteria to PACE included being assessed as having a CFQ score of 6 or above (0 to maximum of 11) using the bimodal scoring system. The paper also states that in the outcomes analysis, Lickert scoring (0,1,2,3; range 0-33, lowest = least fatigue) was substituted for bi-modal scoring. Self rated outcomes were assessed at 12, 24 and 52 weeks.

Amongst the criticisms of the CFQ is that, along with other commonly used fatigue scales, it is prone to ceiling effects :

http://www.springerlink.com/content/p8g2h76327n08151/

“Extreme scoring occurred on a large number of the items for all three recommended fatigue rating scales across several studies. The percentage of items with the maximum score exceeded 40% in several cases. The amount of extreme scoring for a certain scale varied from one study to another, which suggests heterogeneity in the selected subjects across studies.

Because all three instruments easily reach the extreme ends of their scales on a large number of the individual items, they do not accurately represent the severe fatigue that is characteristic for CFS. This should lead to serious questions about the validity and suitability of the Checklist Individual Strength, the Chalder Fatigue Scale, and the Krupp Fatigue Severity Scale for evaluating fatigue in CFS research.”

In this respect, the same author states that the bimodal scoring version of the CFQ is even more prone to ceiling effects.


The practical outcome of this recognised problem is that many respondents will score at the scale maximum at outset and the instrument will be unable to detect any exacerbation in fatigue (that is an adverse effect). In addition if the bimodal scoring version of the CFQ is more prone to ceiling effects (that is extreme scores) than the continuous scoring version then it logically follows that the bimodal version is prone to exaggerating the level of reported fatigue.

It is highly likely, if not unavoidable, that rating the same subjects, at the same point in time using the continuous scoring version compared to bimodal scoring would result in a lower overall fatigue score across the board.

Over two points in time, you would likewise expect that patients fatigue scores assessed at baseline using bimodal scoring would rate fatigue higher than the same patients ratings at the first outcome point, at 12 weeks, using continuous scoring. In other words self rated fatigue would be seen to decline between the two points solely as a function of the revised scoring system regardless of intervention.


Table 3 and figure 2 of the PACE paper show CFQ results for each treatment arm and participants grouped by diagnostic label. It can be seen that, regardless of treatment arm or diagnostic label, fatigue scores in all cases fell by between 4 and 6 points from baseline to 12 weeks and this magnitude of change was not seen for any group between 12 and 24 weeks or 12 and 52 weeks.

In conclusion, the majority of the improvement in fatigue scores for all treatment arms occurred between baseline and 12 weeks and was likely due to post hoc changes made to the scoring system used.


More to follow.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
View attachment 5115

sorry it's so small (click to enlarge slightly).

nb for the working age population that would be the relevant comparison for PACE, the 'tail' would be even smaller (and of course the tail is made up of the sick).

This is taken from page 9 of the open access article (notations added by me): Bowling SF-36 normative data

Although the article and picture are freely available there may be copyright issues so please don't reproduce this pic.


A fine graphic example of why you can't and shouldn't quote parametric statistics like mean and standard deviation for a skewed population distribution.
 

oceanblue

Guest
Messages
1,383
Location
UK
Marco
In other words self rated fatigue would be seen to decline between the two points solely as a function of the revised scoring system regardless of intervention.
I haven't got the paper in front of me so I might have this wrong, but I thought that they used bimodal scoring for entry criteria but that Likert scoring was used for all assessments, including baseline?

I completely agree that ceiling affects would make it hard to detect deterioration in fatigue and hence adverse effects, esp given how high the baseline fatigue means were (goodness knows what the results would be like if they'd included patients who were not well enough to travel to a centre of 1 hour of therapy and be able to attempt a 6-minute walking test).
 

Sean

Senior Member
Messages
7,378
View attachment 5115

sorry it's so small (click to enlarge slightly).

nb for the working age population that would be the relevant comparison for PACE, the 'tail' would be even smaller (and of course the tail is made up of the sick).

A good diagram is worth a 1000 carefully chosen words.

If this doesn't drive home the point about the unreal definitions of normal and recovery they indulge in, I don't know what will.

How do you translate that into a letter to a journal? Will they publish a diagram in the letters page?
 

Dolphin

Senior Member
Messages
17,567
A good diagram is worth a 1000 carefully chosen words.

If this doesn't drive home the point about the unreal definitions of normal and recovery they indulge in, I don't know what will.

How do you translate that into a letter to a journal? Will they publish a diagram in the letters page?
It would be called a left skewed distribution.
One adjective that could be added is "strongly left skewed". I'm not sure whether "strongly" has a precise numerical value - probably not/one could probably get away with it using it in a lay sense. A similar wording is "heavily left skewed". There might be other adjectives.
 

oceanblue

Guest
Messages
1,383
Location
UK
A fine graphic example of why you can't and shouldn't quote parametric statistics like mean and standard deviation for a skewed population distribution.
Don't suppose you have any references for that point? A couple of other people (outside the forum) have said the same thing, suggesting the 'mean - 1SD' formula cannot be applied with such a skewed population.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
Marco

I haven't got the paper in front of me so I might have this wrong, but I thought that they used bimodal scoring for entry criteria but that Likert scoring was used for all assessments, including baseline?

Thank you for the note of caution.

This is what they say on page 3

"Before outcome data were examined, we changed the original bimodal scoring of the Chalder fatigue questionnaire (range 0-11) to Likert scoring to more sensitively test our hypothesis of effectiveness"

This is ambiguous.

Certainly, the data for baseline is presented as Likert scores which could mean (a) They were reassessed at baseline using Likert scoring and Likert was used thereafter (b) the bimodal scores collected at baseline were transformed to the Likert and Likert used thereafter, (c) data was collected throughout using bimodal scoring but converted to Likert for analysis.

(b) would be akin to checking the temperature using a faulty thermometer, converting from Fahrenheit to Celsius and then comparing baseline in Celsius to later temperatures in celsius using a thermometer thats properly calibrated - the scenario I'm suggesting.

(c) seems a bizarre thing to do as the sensitivity would be determined by the scoring method used to collect the data.

(a) would be likely to result in further applicants being excluded at the baseline stage if they now scored too low on the CFQ as scored using Likert. There is no evidence this happened.

We may never know but I'd like to.
 

oceanblue

Guest
Messages
1,383
Location
UK
It would be called a left skewed distribution.
One adjective that could be added is "strongly left skewed". I'm not sure whether "strongly" has a precise numerical value - probably not/one could probably get away with it using it in a lay sense. A similar wording is "heavily left skewed". There might be other adjectives.

This is from that Bowling paper
Kolmogorov-Sminov tests were also highly significant [K-S Z=11.75 (p<0.01)]... ... these results confirm the highly skewed nature of the distributions, which is a problematic feature of all health scales
 

oceanblue

Guest
Messages
1,383
Location
UK
The data is the same for both Likert scoring and Bimodal. Each question has 4 answers:
worse/same/better/much better scored 0,1,2,3 under Likert, and this is the data that's collected.
Bimodal uses the same data but collapese it into two scores:
0,1 now both score 0; 2,3 both score 1 so each question is scored 0 or 1.

Bimodal and Likert scoring are both based on the same collected data, though that isn't obvious from the Lancet paper.
 

oceanblue

Guest
Messages
1,383
Location
UK
A good diagram is worth a 1000 carefully chosen words.
How do you translate that into a letter to a journal? Will they publish a diagram in the letters page?
sadly, I don't think they will, and isn't for the exact same data as PACE used, though the pattern will be very similar.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
The data is the same for both Likert scoring and Bimodal. Each question has 4 answers:
worse/same/better/much better scored 0,1,2,3 under Likert, and this is the data that's collected.
Bimodal uses the same data but collapese it into two scores:
0,1 now both score 0; 2,3 both score 1 so each question is scored 0 or 1.

Bimodal and Likert scoring are both based on the same collected data, though that isn't obvious from the Lancet paper.

Aha - I see. Red herring then unfortunately.

Just one slight concern remaining. It seems slightly counterintuitive that you would collect data recording one of four possible responses and then collapse that to a bimodal 0 or 1. For simplifying the statistical analysis presumably?
 

oceanblue

Guest
Messages
1,383
Location
UK
Aha - I see. Red herring then unfortunately.

Just one slight concern remaining. It seems slightly counterintuitive that you would collect data recording one of four possible responses and then collapse that to a bimodal 0 or 1. For simplifying the statistical analysis presumably?

Sadly so. Psychologists championed the bimodal scoring (I can't remember the reason) until last year's FINE trial bombed with bimodal scoring and a post-trial analysis suggested they would have found a small significant improvement using Likert scoring (sadly too little, too late). It's probably just a coincidence that PACE decided to switch from bimodal to Likert scoring at about the same time.
 
Back