PACE Trial and PACE Trial Protocol

oceanblue · Mar 10, 2011

Bob said:
And on a closely linked subject...
It seems to me that a score of 12 or above on the Chalder questionnaire, would indicate a trend to an overall worsening of health...
If, on average, the answers tend towards "worse than usual", which would be the case for an average score 12 or above (a score of 11 being neutral, and not indicating any change either way), then that would indicate an overall worsening of health.
So how can a score of '18' indicate a 'normal range', when it actually indicates an overall deterioration in health?

The original bimodal 'normal range' of '3 or less' did actually indicate an improving participant.
But the new likert 'normal range' of '18 or less' includes participants who are worsening in overall health.

Do you think these interpretations are solid?

Actually, I think you are on to something here (even though the trial particpants are scoring themselves relative to their pre-illness self).

The key point is that the threshold of 18 is set on the basis of a general population so that for the general population 18 is supposed to be within the normal range. So, pull someone out of the general population with a score of 18 and they have worsening fatigue on 7 out of 11 items - how is that 'normal'?

Dolphin · Mar 10, 2011

oceanblue said:
Actually, I think you are on to something here (even though the trial particpants are scoring themselves relative to their pre-illness self).

The key point is that the threshold of 18 is set on the basis of a general population so that for the general population 18 is supposed to be within the normal range. So, pull someone out of the general population with a score of 18 and they have worsening fatigue on 7 out of 11 items - how is that 'normal'?

Technically they could get in with 4 questions where a symptom was worse/much worse than normal, but they would need to score "much worse than normal" on at least three of those four (i.e. 3x3, 1x2, 7x1). But I agree that people would not consider 18 a normal/healthy score. It's very similar to the SF-36 argument.

Bob · Mar 10, 2011

oceanblue said:
Actually, I think you are on to something here (even though the trial particpants are scoring themselves relative to their pre-illness self).

The key point is that the threshold of 18 is set on the basis of a general population so that for the general population 18 is supposed to be within the normal range. So, pull someone out of the general population with a score of 18 and they have worsening fatigue on 7 out of 11 items - how is that 'normal'?

The Chalder questionnaire has strange idiosyncrasies...
It is impossible for a fantastically healthy top level athlete to score below 11, even if 'usual' means normal levels of fatigue when you are well.
It's definitely not a linear scale, which makes analysing the results confusing and complex. (But then I'm easily confused!)

I can't see how the Likert scale can work, because it lumps good results in with bad results (you get points for feeling well, and points for feeling ill) ("no more than usual" is a good result, but it is awarded a point), whereas the bimodal scoring only gives you points for feeling ill ("more than usual" and "much more than usual.")

Anyway, I can't see that this is significant or helpful for our work... I'm just making an observation.

Sam Carter · Mar 10, 2011

oceanblue said:
...

The key point is that the threshold of 18 is set on the basis of a general population so that for the general population 18 is supposed to be within the normal range. ..

Hi OB,

I haven't followed this thread closely so I may be repeating a point already made, but Peter White's figure of 18 as the upper-bound of normal for a Likert-scored CFQ is categorically wrong -- a person with a Likert score of 18 has a bimodal score of at least 4 ( and almost certainly more) and therefore has abnormal levels of fatigue according to the trial protocol and the established literature. He can't have it both ways: if he is going to use the CFQ he has to use it honestly and can't rewrite the rule book to big-up his results. The Lancet peer reviewers should have been all over this kind of misdirection.

ETA: just as they should have been all over Bleijenberg and Knoop for saying "PACE used a strict criterion for recovery: a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person’s score". For fatigue, at least, this is just untrue -- not a little bit wrong, or open-to-interpretation-wrong but demonstrably incorrect. At the very least The Lancet has to issue a correction on this point.

Dolphin · Mar 10, 2011

Bob said:
The Chalder questionnaire has strange idiosyncrasies...
It is impossible for a fantastically healthy top level athlete to score below 11, even if 'usual' means normal levels of fatigue when you are well.
It's definitely not a linear scale, which makes analysing the results confusing and complex. (But then I'm easily confused!)

I can't see how the Likert scale can work, because it lumps good results in with bad results (you get points for feeling well, and points for feeling ill) ("no more than usual" is a good result, but it is awarded a point), whereas the bimodal scoring only gives you points for feeling ill ("more than usual" and "much more than usual.")

Anyway, I can't see that this is significant or helpful for our work... I'm just making an observation.

I think the best way to think about it is that 11 is a normal score. Indeed healthy populations have averages very close to that. So it can make at least some sense.

Dolphin · Mar 10, 2011

Significance for secondary outcome measures was set at 0.01 in the protocol

In the protocol, we were told:
http://www.biomedcentral.com/1471-2377/7/6

Results from all analyses will be summarised as differences between percentages or means together with 95% confidence limits (CL). The significance level for all analyses of primary outcome variables will be P = 0.05 (two-sided); for secondary outcome variables, P = 0.01 (two-sided) unless profiles of response can be specified in advance.

In the final paper, there is no mention of this threshold.

Table 6 shows other secondary outcomes. At 52 weeks,
participants in the CBT and GET groups had better outcomes than did participants in the APT and SMC groups for work and social adjustment scores, sleep disturbance (CBT wasn't better than APT or SMC for sleep using the 0.01 threshold),
and depression (with the one exception that GET was no
diff erent from APT for depression) (CBT also wasn't different from different from APT for depression using the 0.01 threshold). Anxiety was lower after
CBT and GET than it was after SMC (it wasn't for GET, using the 0.01 threshold), but not than after APT. There were fewer chronic fatigue syndrome symptoms
after CBT than there were after SMC (not using the 0.01 threshold). Poor concentration
and memory did not diff er between groups. Postexertional
malaise was lower after CBT and GET than it was after APT
and SMC (it was not different between CBT or either APT or SMC using the 0.01 threshold). 6-min walking distances were greater after GET
than they were APT and SMC, but were no different after
CBT compared with APT and SMC. There were no differences in any secondary outcomes between APT and SMC groups (webappendix pp 6–9).

Wording in the discussion would also have to change.

The 0.01 threshold (99% confidence intervals rather than 95% confidence intervals) would similarly have meant that there would have been a lot more of the bars in pages 6-9 of the Web Appendix passing over the 0 (no difference) line.

Dolphin · Mar 10, 2011

RE: Peter White/Barts "can't" complain the PACE Trial was too short

RE: Peter White/Barts "can't" complain the PACE Trial was too short
http://forums.aboutmecfs.org/showth...Trial-Protocol&p=163274&viewfull=1#post163274
--------
Not as definite as the other quote, but it in the official manual:

GET participant manual
http://www.pacetrial.org/docs/get-participant-manual.pdf

What will my GET programme consist of?

[..]

(in a box)
This process may take anywhere from weeks to **months** the process is slow and steady; patience and keeping your brakes on may be just as important as increasing activity.

Dolphin · Mar 10, 2011

For CDC criteria - they only asked about symptoms in the last week!!

For CDC criteria - they only asked about symptoms in the last week!!

Looking at the (unpublished) PACE Trial protocol (i.e. the long version):

Appendix 6: Case Report Forms
A6.4 CDC
CDC

Please score whether you have had any of the following symptoms in the last week:

Score each symptom by putting a circle round the number that most closely resembles the frequency and intensity of that particular symptom.

Symptoms

not present at all
present a little
presence more often than not
present most of the time
present all of the time

Impaired memory or Concentration

Sore throat

Tender lymph node (glands) in your neck or under your arms

Muscle pain

Joint pain in several joints without swelling or redness

New headache

Unrefreshing sleep

Feeling ill after exertion

The CDC criteria asks for symptoms over the last 6 months. People can have temporary symptoms for all sorts of reasons e.g. Pre-Menstrual Syndrome/similar.

And I'm guessing that "present a little" may have counted as having it.

Dolphin · Mar 10, 2011

Speculative point on the Clinical Global Impression (CGI)

Clinical Global Impression (CGI):

(I put "your health" in bold)

Overall, how much do you feel your health has changed since the start of
the study? Please tick the one box below that most closely corresponds to
how you feel now.

1. Very much better
2. Much better
3. A little better
4. No change
5. A little worse
6. Much Worse
7. Very much worse

There is a lot of "exercise propaganda" in society - exercise is good for you. Then the CBT and GET participants get more of it as part of their treatment. I wonder could this influence them when answering questions about rating their health i.e. because they now have a regular enough walking routine (those that do, that is), would they feel that their health has improved even if they are still not working or not working much and by most people's standards aren't healthy.

I've read descriptions of people who have done some of such programs who seem a bit obsessed with going for their walk as if it by itself was the answer.

Dolphin · Mar 10, 2011

One could have a severe "non-serious adverse event"

If one looks at Table 4 [Safety outcomes] there is a category, "Non-serious adverse events"
There are huge numbers for each type:
949 848 992 977

I think a lot of people have not paid much attention to them because of the title.
However, the title may be slightly misleading.

Indeed if one looks at:
A.6.35 Non-serious adverse event report log the (unpublished) PACE Trial protocol (i.e. the long version), there is a Severe category!!

Please rate the severity of the event
If unsure or concerned, consult with Centre Leader
(Mild/Moderate/Severe)

Snow Leopard · Mar 10, 2011

Has anyone read this:

http://informahealthcare.com/doi/abs/10.3109/09638288.2010.503256
"Measuring substantial reductions in functioning in patients with chronic fatigue syndrome"
Jason et al.

Dolphin said:
I wonder could this influence them when answering questions about rating their health i.e. because they know have a regular enough walking routine (those that do, that is), would they feel that their health has improved even if they are still not working or not working much and by most people's standards aren't healthy.

Yes, I agree and that is why I believe a 6 minute walking test is hardly objective if it does not consider the overall impact on symptoms and activity levels over the next few days.

Dolphin · Mar 10, 2011

(Not important) It's hard to dissatisfy people! (sort of)

(last observation from me for the moment I think)

It's hard to dissatisfy people! (sort of)

The authors state:
Our findings were strengthened

Overall, how satisfied are you with the treatment you received?

Very satisfied
Moderately satisfied
Slightly satisfied
Neither
Slightly dissatisfied
Moderately dissatisfied
Very dissatisfied

Please add any additional comments below:

At 52 weeks, participants rated satisfaction with treatment
received on a 7-point scale, condensed into three categories
to aid interpretation (satisfied, neutral, or dissatisfied).

APT CBT GET SMC

Satisfied with treatment 128 (85%) 117 (82%) 126 (88%) 76 (50%)

Dissatisfied with treatment 4 (3%) 7 (5%) 2 (1%) 17 (11%)

So only 11% the SMC people who really didn't get much: http://www.pacetrial.org/docs/ssmc-doctor-manual.pdf were dissatisfied.

It might have been interesting to see the "slightly dissatisfied" group although for the first three, it still probably wasn't big. (They don't have data for some people of course who probably would be more likely to say they were dissatisfied).

Missing for APT: 8-9 (out of 159)
Missing for CBT: 18-19 (out of 161)
Missing for GET: 16-17 (out of 160)
Missing for APT: 7-9 (out of 160)

If one included those in the dissatisfaction figures e.g. as part of a sensitivity analysis, the figures might look worse.

anciendaze · Mar 10, 2011

The 0.01 threshold (99% confidence intervals rather than 95% confidence intervals) would similarly have meant that there would have been a lot more of the bars in pages 6-9 of the Web Appendix passing over the 0 (no difference) line.

Confidence levels only make sense if you are dealing with normal distributions for which you have reasonably consistent parameters. This is not true. My example of sampling a population with a uniform distribution was intended to illustrate this. You could think of a uniform distribution as a piece of a normal distribution with an arbitrarily large SD. You could assign a wide range of values for population SD here. The distribution is not Gaussian (normal). What you are highlighting here is inconsistency. The underlying absurdity remains. None of those confidence values are meaningful.

Dolphin said:
If one looks at Table 4 [Safety outcomes] there is a category, "Non-serious adverse events"
There are huge numbers for each type:
949 848 992 977...

These were numbers which caught my eye earlier. No matter what those running the trial decided about seriousness, the patients considered these negative events. It would be surprising if there were no correlation between adverse events and dropouts.

This is not simply a matter of having those in a lower range drop out. All participants were far from healthy. People drop out based on expectations of outcomes. Those who felt they were trending downward, or bouncing up and down with no real effect, would be most likely to quit. Those who felt they could cross an upper bound were more likely to make an effort to stick it out.

It would only take a modest effect of this type to produce the minimal apparent gains reported. The above numbers are all it would take.

Dolphin · Mar 10, 2011

anciendaze said:
The 0.01 threshold (99% confidence intervals rather than 95% confidence intervals) would similarly have meant that there would have been a lot more of the bars in pages 6-9 of the Web Appendix passing over the 0 (no difference) line.

Click to expand...

Confidence levels only make sense if you are dealing with normal distributions for which you have reasonably consistent parameters. This is not true. My example of sampling a population with a uniform distribution was intended to illustrate this. You could think of a uniform distribution as a piece of a normal distribution with an arbitrarily large SD. You could assign a wide range of values for population SD here. The distribution is not Gaussian (normal).

Confidence Intervals (CIs) is what they call them themselves underneath.
I think I get your point that one might not know how a distribution might be distributed. (technically I imagine one can talk about confidence intervals for other distributions apart from Gaussian ones).

anciendaze said:
What you are highlighting here is inconsistency. The underlying absurdity remains.

I never said otherwise. But one can't write in and say "your article is absurd" or "you fiddled the books"/"did do what you said you would in the protocol" or whatever without making specific points.
They said they would use a p<.01 threshold but switched to a p<0.05 threshold effectively.
This wasn't highlighted in the text.

oceanblue · Mar 11, 2011

Dolphin said:
In the protocol, we were told:
http://www.biomedcentral.com/1471-2377/7/6

for secondary outcome variables, P = 0.01 (twosided)
unless profiles of response can be specified in advance.

In the final paper, there is no mention of this threshold.
Wording in the discussion would also have to change.

The 0.01 threshold (99% confidence intervals rather than 95% confidence intervals) would similarly have meant that there would have been a lot more of the bars in pages 6-9 of the Web Appendix passing over the 0 (no difference) line.

The next para in the protocol says this:

Prior to writing the Analysis Strategy a consensus will be
reached on the profiles of response for each secondary
outcome within each of the four treatment groups.

Now, since I'm not even sure what a 'profile of response' is, it's hard for me to interpret this. But it does look like might have set themselves up to 'specificy the profile of response' in advance and so allow use of the 95% confidence interval. Though to be honest, I haven't a clue what they're on about.

anciendaze · Mar 11, 2011

Dolphin said:
Confidence Intervals (CIs) is what they call them themselves underneath.
I think I get your point that one might not know how a distribution might be distributed. (technically I imagine one can talk about confidence intervals for other distributions apart from Gaussian ones).

It might be possible to talk about confidence intervals for other distributions, but this is so unusual I would expect a bibliographic reference to mathematical research. The underlying theory of distributions requires familiarity with things like measure theory, Borel sets and Lebesgue integrals. Most mathematical distributions turn out to be nowhere differentiable. If Rieman integrals scare you, these things are serious hurdles. The overwhelming likelihood is that the assumption of Gaussian distributions is so deeply buried no one even imagined questioning it.

But one can't write in and say "your article is absurd" or "you fiddled the books"/"did do what you said you would in the protocol" or whatever without making specific points.

No argument there. I keep trying to find a way to push them into stating one of those unfounded assumptions on which this whole mess rests. If you artificially create a cluster through the sampling process, no one should be surprised that it spreads out afterward. At this point, the influence of those natural and artificial bounds takes over. Those subjects moving down are subject to different influences from those moving up, irrespective of treatment. The problem is how to get this across to possessors of invincible ignorance.

Dolphin · Mar 11, 2011

PACE trial results - ME Association writes to the Science Media Centre

Earlier, a press release on the PACE trial from the Science Media Centre: http://www.sciencemediacentre.org/pages/press_releases/11-02-17_cfsme_trial.htm

The (UK) ME Association have now sent a letter to Science Media Centre regarding press coverage of the PACE trial results: http://www.meassociation.org.uk/?p=5041

There is a specific thread to discuss this: http://forums.aboutmecfs.org/showth...eting-with-Science-Media-Centre-on-PACE-trial
so best to discuss it there I think as this thread is quite busy.

Dolphin · Mar 11, 2011

(Just a bit of a moan)

oceanblue said:
The next para in the protocol says this:

Prior to writing the Analysis Strategy a consensus will be
reached on the profiles of response for each secondary
outcome within each of the four treatment groups.

Click to expand...

Thanks - missed that

oceanblue said:
Now, since I'm not even sure what a 'profile of response' is, it's hard for me to interpret this. But it does look like might have set themselves up to 'specificy the profile of response' in advance and so allow use of the 95% confidence interval. Though to be honest, I haven't a clue what they're on about.

Ok. Are they claiming they thought GET and CBT would come out better on everything? - doesn't seem like clinical equipose (if being awkward). They probably should have said when the threshold was being set at 0.01. Was it 0.01 when APT was against SMC (to make it hard for APT to seem better than SMC?) - would be dodgy if it was. I would presume they were saying at least some of them weren't 0.05. What's the point of mentioning the threshold of 0.01 if you're not going to use it. And if you stick to convention, you shouldn't use 0.05 for lots and lots of comparisons (I don't necessarily agree with the convention but they were the people who brought up 0.01).

Also, they said:
http://www.biomedcentral.com/1471-2377/7/6

Differential outcomes
Because CBT and GET are both based on a graded exposure to activity, they may preferentially reduce disability, whilst APT, being based on the theory that one must stay within the limits of a finite amount of "energy", may reduce symptoms, but at the expense of not reducing disability. By measuring both symptoms and disability as our primary outcomes, we will be able to test a secondary hypothesis that these treatments may differentially affect symptoms and disability.

In this case, they were saying fatigue was symptoms. While some of the secondary outcome measures were specific symptoms. So one might think that for symptoms, they might say they don't know and use 0.01.

Dolphin · Mar 11, 2011

SCID used to diagnose psychiatric disorders - so there is a high rate

The rate of psychiatric disorders in the PACE Trial was 47%. Reading some papers, that might not seem high.

But some papers use the CDC empiric criteria (rubbish) or other methods to diagnose psychiatric disorders such as questionnaires which may not be accurate as, for example, symptoms like fatigue, sleep problems, concentration/cognitive problems can be seen as evidence of psychiatric problems.

In their 1998 book, Friedberg & Jason point out that the SCID (used in the PACE Trial) finds lower rates of psychiatric disorders in CFS than other screening methods such as the DIS. The figures they quote for SCID studies are:
- Hickie et al. (1990) 24.5%;
- Lloyd et al. (1990) 21%;
and
- Taylor & Jason (1998) 22%.
So the rates of current psychiatric disorders in PACE Trial patients (47%) are quite high.

oceanblue · Mar 11, 2011

Dolphin said:
(Just a bit of a moan) And if you stick to convention, you shouldn't use 0.05 for lots and lots of comparisons (I don't necessarily agree with the convention but they were the people who brought up 0.01).

As I said, I'm not quite sure what exactly they are on about here but I supsect it's something to do with post-hoc ie if they are looking for any possible comparison (post hoc) there may be a convention that they need to use a p<0.01 (which might explain why they feel the need to mention it). But by specifiying in advance which comparions are key they can go back to using p<0.05. As they didn't spell out which comparison they had decided upon, and I suspect they won't tell us, it's probably not going to be fruitful to pursue this one. But it does give the impression they are making things up as they go along to suit.

PACE Trial and PACE Trial Protocol

Guest

Senior Member

Senior Member

Guest

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Hibernating

Senior Member

Senior Member

Senior Member

Guest

Senior Member

Senior Member

Senior Member

Senior Member

Guest