• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

Angela Kennedy

Senior Member
Messages
1,026
Location
Essex, UK
I don't know why I said "yellow" here!!! I meant red versus green and red versus purple-ish.
You are talking here about the sub-groups of patients with 'London', 'Reeves', 'psychiatric disorder' etc and their response to the treatments?

I'm trying to get help with a t-test on those groups themselves (independent of treatment)- but are you saying it might be worth a t-test on those with the allocated treatments as well?
 

Dolphin

Senior Member
Messages
17,567
You are talking here about the sub-groups of patients with 'London', 'Reeves', 'psychiatric disorder' etc and their response to the treatments?

I'm trying to get help with a t-test on those groups themselves (independent of treatment)- but are you saying it might be worth a t-test on those with the allocated treatments as well?
I'm saying if we had the actual means and standard deviations for figures 2F and 2G that there might not be a statistically significant difference between the SMC score and the CBT score and the SMC score and the GET scores in 2F and 2G. Somebody said to me just by looking at it one can tell there is no difference - I'm trying to think that through in my head. The trial authors should be willing to release such information (the means and standard deviations). If they don't, I suppose it might be down to measuring as accurately as possible the means and SDs from the graphs. The overall p-value does not necessarily tell one that there is no differences I think for individual comparisons (I'm unclear what the overall p-value is testing but I've no reason to trust the authors and what they are saying it shows).

Great you're following it up.
 

Dolphin

Senior Member
Messages
17,567
Thanks for the summary Sean.

It is still important that we address the problem of the cohorts- we can't forget this issue, and here's why.

Lacklustre results notwithstanding, this trial has been spun, both in the Lancet and in the press, as substantiating the safety of CBT/GET for ME sufferers (even if they use their term 'CFS/ME'). These results are generally no more impressive than any CBT/GET trials done in the past, but that wasn't necessarily their primary aim. This appears to have been to dismiss the claim that CBT/GET is unsafe for ME sufferers, because this is an extremely serious allegation.

Now they've undertaken some outstanding ontological gerrymandering to do this, but they've managed to pooh-pooh concerns about safety, and THIS will be the serious issue people will be faced with in the future.

Showing how poor the results of this trial are is one thing (in important one). But it will still be business as usual unless we can show that CBT/GET is still potentially unsafe. The reasons we know CBT/GET has not been established as 'safe' for ME sufferers are:

1. The PACE cohorts have potentially eliminated all ME sufferers from the trial. AT BEST, very few will have got in. Maybe none at all were in there. If this happened, it will have been achieved at the doctor examination/history taking stage, which WAS ad hoc (the only standardised form was a sign off by the research nurse after the doctors had 'screened' the patients). It would be highly problematic for the doctors to include any people with neurological deficit in the trial- because those deficits may have represented other neurological illnesses, like MS etc. I believe it is significant that so many people attending the 'specialist clinics' (over 1000) were deemed not to have met Oxford. Previous 'CFS' research cohorts in the UK have been worked so that people with organic dysfunction seen in ME (say Canada, even the historical ME case descriptions etc.) are excluded from these.

2. There is uncertainty over how 'Reeves' were used. On one table they place them as a sub-group of the cohort (which might lead one to believe they were inclusionary criteria performed after Oxford). But the text on page 2 shows that Reeves were used for exclusionary purposes (to "exclude alternative diagnoses") along with NICE (those are the two references given here). There is no literature on the PACE protocol that I can see that sets out standardisation of Fukuda (or Reeves) inclusion or exclusion requirements.

3. As someone has already said, 47% of the cohort had a psychiatric disorder. Now - there is some strange comment on the Pace Trial protocol about the "grey box ineligible for trial" because even on the pdf- there are three shades of grey (and two textures of 'hashed' and 'smooth'!) But it looks like all sorts of people were eligible for inclusion, including agoraphobics, any phobics, OCD, PTSD, and lifelong psychosis, and there appears some confusion between the SCID form and the 'Oxford form about inclusion/exclusion of bipolars, and schizophrenics! Funnily enough- considering the frequent claims about 'personality disorder' in CFS - these are not included as exclusions (so mean all sorts of personality disorders could be included).

4. The PACE version of the London criteria used actually a diagnosis of 'ME' based on: Exercise induced fatigue (who doesn't get tired after exercise?!) but the 'exercise'/'exertion' has to be 'trivial' in self report; impaiment of short term memory and loss of concentration; fluctuation of symptoms (ubuquitous in all health states and difficult to quantify); 6 months plus duration; no primary depressive illness or anxiety/'neurosis'. That is all that is necessary to meet the criteria for ME (though don't get me started on the instability of the terms anxiety and neurosis!)

5. They have not addressed the issue of abnormal physiological response to exertion, either within the biomedical literature or the reports from patients. This is a major omission. They have NOT considered the differential cohort that would have been established by applying say the Canadian Criteria either, even though this was brought to their attention a good few times. This should have been a limitation of study item (I note there was no such section in the article).

6. Obviously seriously affected/ bedbound etc. weren't included. But they are likely to try and claim slightly and moderately affected are still 'safe' with CBT/GET (and pacing is useless).

There is more to be analysed in the PACE documentation around the cohort. I'm trying to get a T-test done on the subgrouping of patients shown in the table 1 on page 5 of the article pdf, for example.

If anybody is particularly interested in this part of the analysis (cohort problems) and thinks this is worth pursuing and wants to take part - discuss, maybe backchannel, let me know.
It's great if letters can go in on the criteria issue as well as maybe work in other forums.

One point to make might be to point out that people with depression can say they have post-exertional malaise (Leonard Jason's research has shown this) - one needs more detailed questions.

The revised Canadian definition for research (Jason et al., 2010) requires symptoms over the past 6 months to score 2 on both of the following scales to be counted as having the symptom:
1 = a little of the time, 2 = about half the time, 3 = most of the time, 4 = all of the time.
0 = symptom not present, 1 = mild, 2 = moderate, 3= severe, 4 = very severe.

Jason,L.A., Evans,M., Porter,N., Brown,M., Brown,A., Hunnell,J., Anderson, V., Lerch, A., De Meirleir, K., & Friedberg, F. (2010). The development of a revised Canadian Myalgic Encephalomyelitis-Chronic Fatigue Syndrome case definition. American Journal of Biochemistry and Biotechnology 6 (2): 120-135, 2010 ISSN 1553-3468. Retrieved from: http://www.scipub.org/fulltext/ajbb/ajbb62120-135.pdf
 

Dolphin

Senior Member
Messages
17,567
Clearly, there is considerable caution here during GET to prevent relapses, an almost pacing-like caution. I won't discuss the adverse effects, but I noticed an interesting catch: While GET has rather optimistic goals and is aimed at "encouraging" the increase of activity in carefully planned increments if possible, the PACE results paper does not mention how many people in the GET group actually managed to increase their activity (correct?), it is quite possible that most people in the GET group did not. Without actigraphy we may never know for sure, but perhaps the 6 minute walking distance is a smoking gun.

More from the GET therapist manual: "By week 4, most participants will be able to commence aerobic exercise." I find it very unusual that a group of people who on average are allegedly ready for (light) "aerobic exercise" after only 4 weeks of GET with the gradual aim of several sessions a week of moderate exercise, cannot even break the 400m barrier on a single 6 minute walking distance after 52-weeks of GET when healthy people (including sedentary people) are scoring 600-700m! It is possible that improvers are skewing the average, and vice versa for non-improvers. However, we know that the supposed superiority of GET over SMC isn't impressive on average. Was there even a single recovery???
This is the sort of area I went into in my letter.

I hope others do also - perhaps they can do it in a better way than what I did.

I don't think it's too late to send in a letter given the print version only came out 9 days ago.
 

Angela Kennedy

Senior Member
Messages
1,026
Location
Essex, UK
It's great if letters can go in on the criteria issue as well as maybe work in other forums.

One point to make might be to point out that people with depression can say they have post-exertional malaise (Leonard Jason's research has shown this) - one needs more detailed questions.

The revised Canadian definition for research (Jason et al., 2010) requires symptoms over the past 6 months to score 2 on both of the following scales to be counted as having the symptom:
1 = a little of the time, 2 = about half the time, 3 = most of the time, 4 = all of the time.
0 = symptom not present, 1 = mild, 2 = moderate, 3= severe, 4 = very severe.

Jason,L.A., Evans,M., Porter,N., Brown,M., Brown,A., Hunnell,J., Anderson, V., Lerch, A., De Meirleir, K., & Friedberg, F. (2010). The development of a revised Canadian Myalgic Encephalomyelitis-Chronic Fatigue Syndrome case definition. American Journal of Biochemistry and Biotechnology 6 (2): 120-135, 2010 ISSN 1553-3468. Retrieved from: http://www.scipub.org/fulltext/ajbb/ajbb62120-135.pdf

thanks for this Dolphin.

I will say self-reports of 'post-exertional malaise' are highly uncertain. They can fall into the same category as 'fatigue'. The problem is abnormal physiological response to exercise (and related to neurological symptoms AND SIGNS). How often 'malaise after exertion' happens in self-reports is irrelevant in the circumstances -unless it can be quantified what exactly is being 'felt'. Abnormal physiological response to exertion is quite another matter.

You know, I've been re-reading reading the 'historical' ME literature of late. There were various signs as well as symptoms that to any sensible doctor demonstrate organic dysfunction. It's not as if there isn't precedent with various infective disease states. I think people (even patients) have begun to forget the neurological nature of the illness- especially when people aren't investigated properly, the psychs claim ME is a belief, and the charities describe ME as if it is tiredness, bad sleeping, and feeling a bit achy in media interviews.

While I do have problems with how easily Ramsay and others allowed the 'hysteria' explanation to take root - looking at the clinical descriptions available, it is amazing to see how the neurological nature of ME has been spun out of existence in modern medical awareness with little objection from medics.

In the future- well after a few letters are published in the Lancet (if they are- I'm worried that very few will be published and not necessarily the best ones)- a key priority will be to take the evidence already available to show why PACE did not establish safety of CBT/GET for neurological ME sufferers: and this may mean going up against NICE again, the PACE authors, even the Lancet, unfortunately, among others. Obviously these will be longer-term goals we need to keep in sight.
 

anciendaze

Senior Member
Messages
1,841
Is that a direct quote from the authors, or just your take on their thinking?

Not having a go. Just that if they actually said that, it could be powerful ammo against them.
My comment is a distillation of repeated statements by the same people. Don't expect them to say anything which can't be weaseled out of w.r.t. this trial. My point was that this unstated assumption was actually the source for the distribution which set the standards for the entire trial. The referenced research appears to yield a very different distribution.

If they are free to make such a choice, so am I. My suggestion was that a uniform distribution covering the range of the trial, plus clustering produced by the sampling process, would also work. In this case, all the SDs result from sampling, not from any characteristic of the population being sampled.
I am not statistically literate, and most of this kind of stuff has me struggling. But this particular point has been bugging me too (if I understand it correctly). If they are calculating the SD based on the study population, without reference to the general population, that would unjustifiably enhance the statistical significance of any treatment response.

They have given us the relative improvement (before/after stuff), but not the absolute frame of reference for the important broader context (real world comparisons to the general population).

Am I reading this right?
Calculating SDs for a sample group is a straightforward numerical task. These numbers are assumed to reflect characteristics of the population under study. The means for these groups of patients are far from the mean for the general population, or even the assumed mean for the subpopulation being sampled. There is good reason to believe the apparent clustering of scores for patients is the result of the selection process which put them in the trial. Subsequent spread of these scores is to be expected. If natural bounds prevent those with very low scores from completing the trial, a general upward trend is predictable. Any reference to this illness should show highly variable individual performance over time, supporting this argument.

There is also a correlation between 'adverse events' and improved scores. This suggests GET was particularly effective in causing 'adverse events' which in turn led those who anticipated low scores in the future to drop out, no matter what their last score.

A corollary from this is that the apparent safety of GET may entirely result from careful monitoring during the trial and the wisdom of those who dropped out. This would not transfer to practice unless such oversight is part of treatment, in which case costs remain roughly as high as in this trial.
 

Dolphin

Senior Member
Messages
17,567
More from the GET therapist manual: "By week 4, most participants will be able to commence aerobic exercise." I find it very unusual that a group of people who on average are allegedly ready for (light) "aerobic exercise" after only 4 weeks of GET with the gradual aim of several sessions a week of moderate exercise, cannot even break the 400m barrier on a single 6 minute walking distance after 52-weeks of GET when healthy people (including sedentary people) are scoring 600-700m! It is possible that improvers are skewing the average, and vice versa for non-improvers. However, we know that the supposed superiority of GET over SMC isn't impressive on average. Was there even a single recovery???

The PACE protocol defined "recovery" as follows:

("Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less [27], (ii) SF-36 physical Function score of 85 or above [47,48], (iii) a CGI score of 1 [45], and (iv) the participant no longer meets Oxford criteria for CFS [2], CDC criteria for CFS [1] or the London criteria for ME [40].)

Rates of fully recovery were not given in the results (correct?), but you can be sure that if there were an impressive rate of "recoveries" the authors would be proudly announcing them.

At this stage it does seem that the GET rationale of deconditioning has been thoroughly discredited or at least massively exaggerated. Some studies have already found deconditioning does not perpetuate CFS. Another pro-GET study reported that GET changed patients' "perceptions" rather than actual fitness levels. Research like that, combined with the unimpressive PACE results and an earlier meta-analysis of actigraphy results which found no increases in activity, may be why biopsychosocialists (like those who wrote the editorial and conducted the meta-analysis) while not admitting a major chunk of their hypothesis has been debunked (fear-avoidance and deconditioning) are now focusing more on "cognitions" and "perceptions" about symptoms. This of course is plagued by another set of problems.
They didn't report the "recovery" figures; I agree with you - I think if they had got good rates of recovery we would have heard it.

Given the model for GET (i.e. symptoms/deconditioning are temporary and reversible), I think there should be an obligation on them to report the figures or otherwise one doesn't know if the model has been tested:

Theoretical model
GET assumes that CFS/ME is perpetuated by deconditioning (lack of fitness), reduced
physical strength and altered perception of effort consequent upon reduced physical
activity. A normal process of adaptive change in the body is assumed to occur as a
consequence of rest or a reduction in physical functioning, i.e. weakening of muscles,
reduction in fitness, ('use it or lose it') and altered perception of effort. Activity can then produce symptoms as a result of these negative changes, as the body is attempting a physical activity beyond its current capacity. These changes are thought to be reversible, and thus improving fitness and physical functioning will alter perception of effort, enable the body to gain fitness and strength, leading to a reduction in symptoms and an increase in activity capacity ('use it and gain it'). Preliminary research suggests that reduced symptoms arise from simply doing a GET programme, rather than necessarily getting fitter, whereas improved function is related to getting fitter and stronger. Participants are encouraged to see symptoms as temporary and reversible, as a result of their current physical weakness, and not as signs of progressive pathology. A mild and transient increase in symptoms is explained as a normal response to an increase in physical activity.

There may be other mechanisms involved in the success of GET apart from reversing
deconditioning, including elements of habituation, and positive effects of re-engagement
with important activities. GET has also been shown to improve sleep, cognition, and
mood; factors that are also likely to perpetuate the condition, although these are not
directly addressed by the treatment.
 

Dolphin

Senior Member
Messages
17,567
ETA: This isn't the best reply - others should possibly skip this for the moment e.g. I feel I deal with it better in this post: http://forums.aboutmecfs.org/showth...Trial-Protocol&p=164490&viewfull=1#post164490

The redefined primary outcomes were based on comparing the mean scores between SMC and GET/CBT at 52 weeks. And they weren't looking merely for simply significant difference between groups, but used a slightly higher threshold of a clinically useful difference.

A clinically useful difference between the means of
the primary outcomes was defined as 0.5 of the SD of
these measures at baseline, equating to 2 points for
Chalder fatigue questionnaire and 8 points for short
form-36.

However, using 95% confidence intervals, neither CBT nor GET achieved a clinically useful difference in either fatigue or physical function scores, as shown by the Lancet paper's figure 3. Nb to be significant, the confidence intervals for treatment intervals would have to be beyond the targets of +8 (PF) and -2 (fatigue) - and they are not.

So it seems that the PACE trial failed to hit any of it primary outcomes
, even though these had been lowered from those in the protocol.
Good you're still looking but I don't think one can say this.

You have merged together two types of data. Means (SDs) count everyone, improvers and non-improvers/those with and without "clinically useful differences".

What Figure 3 relates to is this sort of data.

So it is true that one can't say that "on average" people doing GET or CBT got an increase of 2 on the fatigue scale, etc. better than the SMC [ETA: if one requires statistical significance].

However, that wasn't the primary outcome measure. The primary outcome measure was simply the mean (SDs) for the two measurements and there were differences there.

They also then looked at "clinically useful differences" using SF-36 PF and fatigue scores. This is now a subset of the data. So for APT, CBT, GET and SMC the figures are 41%, 59%, 61% and 45%. Those aren't the figures being reported in Figure 3.

Mean differences between groups on primary
outcomes almost always exceeded predefined clinically
useful differences for CBT and GET when compared
with [APT and] SMC.
They can say that as all the mean differences for fatigue have an absolute value greater than 2 and for physical functioning, then all the mean differences have a difference of greater than 8 except CBT vs SMC.
 

Dolphin

Senior Member
Messages
17,567
thanks for this Dolphin.

I will say self-reports of 'post-exertional malaise' are highly uncertain. They can fall into the same category as 'fatigue'. The problem is abnormal physiological response to exercise (and related to neurological symptoms AND SIGNS). How often 'malaise after exertion' happens in self-reports is irrelevant in the circumstances -unless it can be quantified what exactly is being 'felt'. Abnormal physiological response to exertion is quite another matter.
Yes, objective measures would be better again.
 

Dolphin

Senior Member
Messages
17,567
Calculating SDs for a sample group is a straightforward numerical task. These numbers are assumed to reflect characteristics of the population under study. The means for these groups of patients are far from the mean for the general population, or even the assumed mean for the subpopulation being sampled. There is good reason to believe the apparent clustering of scores for patients is the result of the selection process which put them in the trial. Subsequent spread of these scores is to be expected. If natural bounds prevent those with very low scores from completing the trial, a general upward trend is predictable. Any reference to this illness should show highly variable individual performance over time, supporting this argument.

There is also a correlation between 'adverse events' and improved scores. This suggests GET was particularly effective in causing 'adverse events' which in turn led those who anticipated low scores in the future to drop out, no matter what their last score.

A corollary from this is that the apparent safety of GET may entirely result from careful monitoring during the trial and the wisdom of those who dropped out. This would not transfer to practice unless such oversight is part of treatment, in which case costs remain roughly as high as in this trial.
I think this point is more true of other previous GET studies than the primary outcome measures here.
There wasn't a huge drop-out rate in this study.
Also with intention-to-treat analysis one uses the data for virtually everyone including the drop-outs (see Figure 1); there have been problems with previous studies where they used "last value carried forward" where they didn't assess peoople when they dropped out and just used the value at a previous assessment before the person suffered an exacerbation. I think there was a bit of a mixture here (i.e. some last value carried forward and some taken at the time of withdrawal).
 

oceanblue

Guest
Messages
1,383
Location
UK
I don't think one can say this.

You have merged together two types of data. Means (SDs) count everyone, improvers and non-improvers/those with and without "clinically useful differences".

What Figure 3 relates to is this sort of data.

So it is true that one can't say that "on average" people doing GET or CBT got an increase of 2 on the fatigue scale, etc. better than the SMC.

However, that wasn't the primary outcome measure. The primary outcome measure was simply the mean (SDs) for the two measurements and there were differences there.


They also then looked at "clinically useful differences" using SF-36 PF and fatigue scores. This is now a subset of the data. So for APT, CBT, GET and SMC the figures are 41%, 59%, 61% and 45%. Those aren't the figures being reported in Figure 3.

Thanks for taking a look a this, but I think it's ambiguous. the phrase:
A clinically useful difference between the means of
the primary outcomes
was defined as 05 of the SD of
these measures at baseline,31 equating to 2 points for
Chalder fatigue questionnaire and 8 points for short
form-36.
seems to refer specifically to the difference between means of the primary outcomes that is for the whole groups, not individuals.

So the phrase clinically useful difference appears to be applied to primary outcomes, before they go on to talk about secondary outcomes. If this CUD is not itself a primary outcome, what is it?

And surely we can at least say that the differences between means on primary outcomes do not amount to CUD?

They then quite separately look at clinically useful differences for individuals in a post-hoc analysis that you mention, but that is a separate analysis (which is quite separate from the point I'm trying to make):
A secondary post-hoc analysis compared the
proportions of participants who had improved between
baseline and 52 weeks by 2 or more points of the Chalder
fatigue questionnaire, 8 or more points of the short
form-36, and improved on both:
nb this is relative to baseline, not SMC.

They can say that as all the mean differences for fatigue have an absolute value greater than 2 and for physical functioning, they all the mean differences have a difference of greater than 8 except CBT vs SMC.
but not if you apply confidence levels, which is exactly what they do in fig 3. Surely differences don't really count if they are not statistically significant?
 

Dolphin

Senior Member
Messages
17,567
Low numbers who did the 6 minute walking test

This partly builds from something Anciendaze said. Maybe it has been said by others.

If one looks at Table 6, the numbers/percentages who did the 6-minute walking test are much lower than the other tests:

588
462
588
587
585
583
589
589

I put these into a stats calculator and did a Chi-Squared test with an expected value of 571.375 (average) http://www.graphpad.com/quickcalcs/chisquared2.cfm

Chi squared equals 23.981 with 7 degrees of freedom.
The two-tailed P value equals 0.0011
By conventional criteria, this difference is considered to be very statistically significant.
The P value answers this question: If the theory that generated the expected values were correct, what is the probability of observing such a large discrepancy (or larger) between observed and expected values? A small P value is evidence that the data are not sampled from the distribution you expected.

This isn't really surprising from looking at it.

As people probably sat the questionnaires altogether, another way of doing it would to compare the figures of those who did the 6-minute walk test versus those who did the questionnaires (a mean value of 587):

Analyze a 2x2 contingency table
Outcome 1 Outcome 2 Total
Did questionnaires (mean) 587 53 640
Did 6-min walking test 462 178 640
Total 1049 231 1280


Fisher's exact test
The two-tailed P value is less than 0.0001
The association between rows (groups) and columns (outcomes)
is considered to be extremely statistically significant.

What sort of person would be more likely not to do a 6 minute walking test? My guess is on average a person who either felt iller at the time of the test (than the average person in the group) or felt iller (than the average person in the group) after doing this test earlier in the trial (at baseline and/or 24 weeks). So the final figures for the group as a whole might be lower again if everyone had been included. They could calculate such a figure by using "last value carried forward" (which has often been used in previous CBT and GET studies).
 

Dolphin

Senior Member
Messages
17,567
Firstly, oceanblue, I edited my last reply a bit after initially writing it (but before you replied). Not sure if you saw that. Anyway, not sure if it needs to be re-read as this one deals with it more directly.

So the phrase clinically useful difference appears to be applied to primary outcomes, before they go on to talk about secondary outcomes. If this CUD is not itself a primary outcome, what is it?

And surely we can at least say that the differences between means on primary outcomes do not amount to CUD?
The primary outcomes are what are reported in the abstract.

If one looks at the PACE Trial protocol:
Secondary outcome measures – Secondary efficacy measures
1. The Chalder Fatigue Questionnaire Likert scoring (0,1,2,3) will be used to compare responses to treatment [27].
So here one has something using the Chalder Fatigue Questionnaire (which is used as a primary outcome measure) but this isn't a primary outcome measure.

Similarly they have:
4. "Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less [27], (ii) SF-36 physical Function score of 85 or above [47,48], (iii) a CGI score of 1 [45], and (iv) the participant no longer meets Oxford criteria for CFS [2], CDC criteria for CFS [1] or the London criteria for ME [40].
They could have defined "recovery" as, say, simply:
(i) a Chalder Fatigue Questionnaire score of 3 or less [27] and (ii) SF-36 physical Function score of 85 or above [47,48],
So these would have used the Chalder Fatigue Questionnaire and SF-36 PF but "recovery" could still be a secondary outcome measure.

Surely differences don't really count if they are not statistically significant?
That I'm not sure about i.e. I haven't read enough papers that talk about "clinically useful difference".

They are merging together two concepts in my mind: average differences and "clinically useful differences" which are more commonly thought of in terms of the percentage of people who actually have them, I would have thought.

However, it is interesting to know the size of the difference of the means. One could have something with mean 49 being statistically larger than something with mean 48 if the standard deviations are small. So they could argue they are giving information by discussing this difference.

However there could be problems with this: one could theoretically make points like this if one only had (say) 10% improvers if the improvers improved by a massive amount (say the scale was 0-1000 rather than 0-100) even if 90% didn't improve.

Of course, this difference has just popped up at the end - there was no mention of "clinically useful differences" of 8 and 2 in the published protocol.
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
The PACE Trial provided evidence that 5 sessions of SMC is more effective than 14 sessions of GET or CBT, so then why don't they just roll out SMC for the entire ME patient community.
SMC is more effective than psychological interventions, and it would save the NHS money.
And if they provided 10 sessions of SMC, then it might be even more effective than SMC+GET or SMC+CBT, and still save money!

I wonder why we haven't heard about the wonders of SMC from the authors of the study, but only the wonders of CBT and GET, which were less effective!

You've lost me. How was it shown that 5 sessions of SMC is more effective than 14 sessions of GET or CBT?

A friend of mine realised that the effects of SMC (alone) out-performed the effects of both CBT (when considered alone) and GET (when considered alone) on every primary outcome measure.

If we look at the improvements due to GET and CBT only, when we take off the effect due to SMC in the GET and CBT groups, then we find that SMC performs better.

So why are they bothering with all this expensive and ineffective therapy?
And why didn't they mention that SMC was so much more effective than GET and CBT, in the paper or in the press release?

Is a 'control' supposed to out-perform one of the interventions being investigated?

On all primary outcome measures, SMC outperformed CBT and GET...
Doesn't this defeat the who purpose of the trial?


Examples:

GET with Chalder scores...

SMC (alone) improved by 4.5 points.
GET + SMC improved by 7.6 points.
GET (only) improved by 3.2 points (adjusted mean difference = 3.2)

So...
SMC improvement = 4.5 points
GET improvement = 3.2 points


CBT with Chalder scores...

SMC (alone) improved by 4.5 points
CBT + SMC improved by 7.4 points
CBT (only) improved by 3.4 points (adjusted mean difference = 3.4)

So...
SMC improvement = 4.5 points
CBT improvement = 3.4 points




GET with SF-36 scores...

SMC (alone) = 11.6 point improvement
GET + SMC = 21 point improvement
GET (only) = 9.4 (adjusted mean difference)

So...
SMC = 11.6 point improvement
GET = 9.4 point improvement



CBT with SF-36 scores...

SMC (alone) = 11.6 improvement
CBT + SMC = 19.2 point improvement
CBT only = 7.1 (adjusted)

So...
SMC = 11.6 point improvement
CBT = 7.1 point improvement
 

Dolphin

Senior Member
Messages
17,567
A friend of mine realised that the SMC control groups performed better than the psychological interventions (CBT and GET). Doesn't this defeat the whole project?

If we look at the improvements due to GET and CBT only when we take off the effect due to SMC in the GET and CBT groups, then we find that SMC performs better.

So why are they bothering with all this expensive and ineffective therapy?
And why didn't they mention that SMC was so much more effective than GET and CBT, in the paper or in the press release?

On all primary outcome measures, SMC outperformed CBT and GET...

Examples:

GET with Chalder scores...

SMC (alone) improved by 4.5 points.
GET + SMC improved by 7.6 points.
GET (only) improved by 3.2 points (adjusted mean difference = 3.2)

So...
SMC improvement = 4.5 points
GET improvement = 3.2 points


CBT with Chalder scores...

SMC (alone) improved by 4.5 points
CBT + SMC improved by 7.4 points
CBT (only) improved by 3.4 points (adjusted mean difference = 3.4)

So...
SMC improvement = 4.5 points
CBT improvement = 3.4 points




GET with SF-36 scores...

SMC (alone) = 11.6 point improvement
GET + SMC = 21 point improvement
GET (only) = 9.4 (adjusted mean difference)

So...
SMC = 11.6 point improvement
GET = 9.4 point improvement



CBT with SF-36 scores...

SMC (alone) = 11.6 improvement
CBT + SMC = 19.2 point improvement
CBT only = 7.1 (adjusted)

So...
SMC = 11.6 point improvement
CBT = 7.1 point improvement
Ok, I get you know.
Interesting point.

Although part of those improvements may not be due to the SMC treatment as such but other factors e.g. average improvements with the passage of time.

Also, although I'm not sure it was meant to happen, the SMC only group had quite a lot more appointments than the SMC + other groups:
Median (Interquartile range) APT CBT GET SMC
Specialist medical care sessions attended 3 (34) 3 (34) 3 (34) 5 (36) 00001
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
Although part of those improvements may not be due to the SMC treatment as such but other factors e.g. average improvements with the passage of them.

Also, although I'm not sure it was meant to happen, the SMC only group had quite a lot more appointments than the SMC + other groups:

So we need to know whether there was a control group in this study or not? (Clearly not.)
So then how did a 5m government-funded study get the go-ahead, without a proper control group, and how did it get published?

They can't try and weasel their way out of these results by saying that there wasn't a control group! (Well, I'm sure that they can, but they shouldn't be able to!)
 

Dolphin

Senior Member
Messages
17,567
Population norm for the SF-36

This point may have been made earlier:

This range was defined as
less than the mean plus 1 SD scores of adult attendees
to UK general practice of 142 (+46) for fatigue (score
of 18 or less) and equal to or above the mean minus 1 SD
scores of the UK working age population of 84 (–24) for
physical function (score of 60 or more).32,33

(Ref. 32 is for the Chalder fatigue scale)

33. Bowling A, Bond M, Jenkinson C, Lamping DL. Short form 36
(SF-36) health survey questionnaire: which normative data should
be used? Comparisons between the norms provided by the
Omnibus Survey in Britain, The Health Survey for England and
the Oxford Healthy Life Survey. J Publ Health Med 1999, 21: 255–70.
I'm finally getting to read Bowling et al. (had started it weeks ago).

Free full text:
http://jpubhealth.oxfordjournals.org/content/21/3/255.full.pdf+html

I think the data above refers to the data in Table 3:
Physical Functioning
Mean (SD) (n)
Male: 86.3 (22.5) (925)
Female: 81.8 (27.7) (1117)

These are not working age figures - they also include quite a lot of people aged 65+ who, not surprisingly, have worse figures e.g. those aged 85+ have a mean of 39.3 (31.5) (although they're the smallest group).
-------------
I’ve attached a file where I’ve calculated the means from scratch.
The mean I get for Table 3 is 82.28834.
However, if one restricts it to working age (16-64) then the mean (SF-36 PF) score is 89.76932.

It’s late here and it’s a long time since I’ve played around with variances; I think I’ve done something wrong trying to calculate the sum variance but I can’t think what. I think I’ve calculated Variance of mean of X when I should be calculating the Variance of X.
Anyway the basic rule is Var(aX)=a^2(Var(X)) where a is a constant and Var(aX+bY)=a^2(Var(X))+b^2(Var(Y)) when X and Y are independent.
So what I did was Var((a1X1+a2X2+….+aiXi)/n) =[Var(a1X1)+Var(a2X2)+…Var(aiXi)]*1/n^2 =[a1^a1(Var(X1))+a2^a2(Var(X2))+...+ai^ai(Var(Xi))]*1/n^2 which gives 7.864014 (as I say, I’m fairly sure this isn’t the figure to use (this is also not a working age figure)).
 

Attachments

  • Bowlingdata.pdf
    19.4 KB · Views: 23

oceanblue

Guest
Messages
1,383
Location
UK
The way I see it is that all of the interventions must contribute both placebo and therapeutic effects.
Take the GET SF-36 scores
36.7 is the baseline score
50.8 = baseline + SMC placebo + SMC therapeutic
57.7 = baseline + SMC placebo + SMC therapeutic + GET placebo + GET therapeutic
Natural recovery plays a role?
I think there may be a third factor at work: natural recovery. the SMC group had only been ill for an average of 2 years (about 3 yrs for GET/CBT groups), which I think is way less than other trials. Now, given that the 'improvement' threshold is set very low (as Dolphin notes) it's possible that small improvents due to natural recovery in the SMC group (and other groups too) contribute significantly to the overall improvement seen. I don't think there is good data on recovery rates for ME patients in general, and particularly not for those who have only been ill for 2-3 years, so there isn't a lot to guide us.

That said, according to the graphs in fig 2, most of the improvement over the year occurs in the first 12 weeks, suggesting the therapeutic/placebo effects are bigger than natural reovery effects.
 

oceanblue

Guest
Messages
1,383
Location
UK
Re Bowling SF36 data:

I think the data above refers to the data in Table 3:
Physical Functioning
Mean (SD) (n)
Male: 86.3 (22.5) (925)
Female: 81.8 (27.7) (1117)

These are not working age figures - they also include quite a lot of people aged 65+ who, not surprisingly, have worse figures e.g. those aged 85+ have a mean of 39.3 (31.5) (although they're the smallest group).
-------------
Ive attached a file where Ive calculated the means from scratch.
The mean I get for Table 3 is 82.28834.

However, if one restricts it to working age (16-64) then the mean (SF-36 PF) score is 89.76932.
Thanks for this graft - i'm definitely not the person to check the maths.
I did wonder if they used a version of working age population, matched to the sex ratio of trial participants, though that would still probably give an SF36 above the figure of 84 they used.
 

anciendaze

Senior Member
Messages
1,841
I think this point is more true of other previous GET studies than the primary outcome measures here.
There wasn't a huge drop-out rate in this study.
Also with intention-to-treat analysis one uses the data for virtually everyone including the drop-outs (see Figure 1); there have been problems with previous studies where they used "last value carried forward" where they didn't assess peoople when they dropped out and just used the value at a previous assessment before the person suffered an exacerbation. I think there was a bit of a mixture here (i.e. some last value carried forward and some taken at the time of withdrawal).
A couple of points: 1) the patients in this trial were all very low on the scale; 2) changing the definition of an adverse event meant patients had to go longer before an adverse event would be recorded. Without access to raw data, and a check on assessments by researchers, we are dependent on their assurances that dropouts did not have a significant effect. Considering that careful analysis raises doubts that reported results are statistically or clinically significant this is a real problem.

A second problem results from the astonishing number of adverse events. It is entirely possible improvements are correlated not with treatment, but with responses to adverse events. Because this was a carefully-supervised study where results were recorded, therapists and clinicians were more careful than usual in dealing with these. (I believe we all have experience with clinicians totally ignoring reported setbacks.) This would say GET showed larger gains precisely because it caused more adverse events.

The argument that setbacks were random favors random-walk models, in which an initial cluster at the low end, produced by sampling, diffuses to a range of values, most higher.