• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

Cognitive Behavioral Therapy and Graded Exercise for CFS: A Meta-Analysis (Castell)

Dolphin

Senior Member
Messages
17,567
Here's the abstract. I'll put some comments in messages below it.

Cognitive Behavioral Therapy and Graded Exercise for Chronic Fatigue Syndrome: A Meta-Analysis

Bronwyn D. Castell, Nikolaos Kazantzis, Rona E. Moss-Morris

Article first published online: 19 DEC 2011

DOI: 10.1111/j.1468-2850.2011.01262.x

Several reviews have concluded that graded exercise therapy (GET) and cognitive behavioral therapy (CBT) may be the most efficacious treatments for chronic fatigue syndrome (CFS).

The current review extends the evidence for overall and outcome-specific effects of CBT and GET by directly comparing the treatments and addressing the methodological limitations of previous reviews.

GET (n = 5) and CBT (n = 16) randomized controlled trials were meta-analyzed.

Overall effect sizes suggested that GET (g = 0.28) and CBT (g = 0.33) were equally efficacious.

However, CBT effect sizes were lower in primary care settings and for treatments offering fewer hours of contact.

The results suggested that both CBT and GET are promising treatments for CFS, although CBT may be a more effective treatment when patients have comorbid anxiety and depressive symptoms.
(I've given each sentence its own paragraph)
 

Dolphin

Senior Member
Messages
17,567
On the authors

The only one of these three names I recognise without any research is Rona E. Moss-Morris.
She's a fairly hardline CBT School psychologist.

Bronwyn D. Castell is a PhD student who could perhaps be easily led (?).

Generally, this review seems reasonable enough but there are a few points where RMM's bias might be in evidence.
 

Dolphin

Senior Member
Messages
17,567
The main numbers

Overall effect sizes for CBT: g = 0.33 (95% CI: 0.10-0.56) (i.e. not large)

Overall effect sizes for GET: g = 0.28 (95% CI: 0.06-0.51) (i.e. not large)

The standardized mean difference coefficient (d) (Cohen, 1988) was used as
the effect size (ES) statistic, adjusted using Hedges correction,
g (Hedges & Olkin, 1985). Study-level estimates of
ES were weighted using the inverse weight procedure, w
(Lipsey & Wilson, 2001). For the primary analysis, estimates
of d were based on a fixed-effects model, except
where significant heterogeneity was present, in which case
a random-effects model was utilized. For between-group
comparisons and moderator analyses, all estimates of d
were based on a mixed-effects model. Substantive
interpretations of effect size magnitude were based on
recommendations by Cohen (1988).
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Erlbaum.

I don't have that book but the standard figures quotes are:
d=0.2 (small effect)
d=0.5 (moderate effect)
d=0.8 (large effect)
Longer piece from Wikipedia below:

They claim:

The mean post-treatment effect size (ES) was of a small to medium magnitude (g = 0.26, confidence interval [CI] = 0.08, 0.43, p < .01).

The mean post-treatment effect size was of a medium magnitude slightly greater than the GET trials (g = 0.33, CI = 0.23, 0.43, p < .001).

I'm not that familiar with the effect sizes. But I think a g=0.26 would more commonly be called of "small" magnitude rather than "small to medium".



http://en.wikipedia.org/wiki/Effect_size

"Small", "medium", "large"

Some fields using effect sizes apply words such as "small", "medium" and "large" to the size of the effect. Whether an effect size should be interpreted small, medium, or large depends on its substantial context and its operational definition. Cohen's conventional criteria small, medium, or big[6] are near ubiquitous across many fields. Power analysis or sample size planning requires an assumed population parameter of effect sizes. Many researchers adopt Cohen's standards as default alternative hypotheses. Russell Lenth criticized them as T-shirt effect sizes[22]

This is an elaborate way to arrive at the same sample size that has been used in past social science studies of large, medium, and small size (respectively). The method uses a standardized effect size as the goal. Think about it: for a "medium" effect size, you'll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. "Medium" is definitely not the message!
For Cohen's d an effect size of 0.2 to 0.3 might be a "small" effect, around 0.5 a "medium" effect and 0.8 to infinity, a "large" effect.[6]:25 (But note that the d might be larger than one)
Cohen's text[6] anticipates Lenth's concerns:

"The terms 'small,' 'medium,' and 'large' are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation....In the face of this relativity, there is a certain risk inherent in offering conventional operational definitions for these terms for use in power analysis in as diverse a field of inquiry as behavioral science. This risk is nevertheless accepted in the belief that more is to be gained than lost by supplying a common conventional frame of reference which is recommended for use only when no better basis for estimating the ES index is available." (p. 25)

In an ideal world, researchers would interpret the substantive significance of their results by grounding them in a meaningful context or by quantifying their contribution to knowledge. Where this is problematic, Cohen's effect size criteria may serve as a last resort.[3]
 

Dolphin

Senior Member
Messages
17,567
Table 2. Domains covered by the methodological quality rating scale
Item Numbers Domain Covered
14, 11 Adequacy and clarity of the sampling method
58 Adequacy of outcome assessment
910, 12, 22 Quality of the study design
13 Manualization of treatment
1417 Control for therapist-level bias
18 Control for concomitant treatments
19 Handling of attrition
20 Adequacy of statistical analysis
21 Consideration of clinical significance

Scores ranged 39-75% for GET, 25-75% for CBT. The PACE Trial was the trial that scored 75%.
 

Sean

Senior Member
Messages
7,378
Thanks for that, Dolphin.

Overall effect sizes for CBT: g = 0.33 (95% CI: 0.10-0.56)

Overall effect sizes for GET: g = 0.28 (95% CI: 0.06-0.51)

The results suggested that both CBT and GET are promising treatments for CFS,...
Bollocks.

The results of this review suggest the opposite that after being tested to death for 20 years, often in highly (even unduly) favourable circumstances, CBT and GET have clearly failed to deliver any substantial therapeutic benefit.


The eyes are not here
There are no eyes here
In this valley of dying stars
In this hollow valley
 

Snow Leopard

Hibernating
Messages
5,902
Location
South Australia
Also note the meta review neglected any discussion of safety...

Moss morris in a previous study on MS related fatigue suggested that self report questionnaires are subject to a variety of biases and actigraph data should be used to demonstrate reduction in disability. Yet there was no discussion of that in the meta review.

My letter-writing fingers are tingling...
 

oceanblue

Guest
Messages
1,383
Location
UK
Do the effect sizes relate to fatigue, physical activity/function or both?

Either way they are not very big. I've seen different definitions of 'small' and 'medium' - problem is that Cohen's orginal paper only specified one value, e.g. 0.2 is small, not a range (eg small is 0.1-0.3, or whatever).

Here's one opinion:
If you have two separate groups (in other words you conducted an independent sample t test), you use the pooled standard deviation instead of the standard deviation.

If Cohens d is bigger than 1, the difference between the two means is larger than one standard deviation, anything larger than 2 means that the difference is larger than two standard deviations. It is seldom that we get such big effect sizes with the kinds of programmes that I evaluate, so the following rule of thumb applies:

A d value between 0 to 0.3 is a small effect size, if it is between 0.3 and 0.6 it is a moderate effect size, and an effect size bigger than 0.6 is a large effect size.
Nb 1.0 is not the maximum effect size, it's just 1 standard deviation. 2 or more is possible.
 

Dolphin

Senior Member
Messages
17,567
What about long term followup?
Effect sizes were calculated for post-treatment outcomes only, as sufficient follow-up data to calculate effect sizes were often omitted from reports.

Some studies have suggested a reduction in efficacy with time. A figure would have been interesting.
 

Dolphin

Senior Member
Messages
17,567
Do the effect sizes relate to fatigue, physical activity/function or both?
Fatigue
Functional Impairment (generally SF36 by which I presume they mean SF-36 PF subscale)
Depression
Anxiety

Four outcome categories were used to estimate the
effect of each treatment on fatigue, anxiety, depression,
and functional impairment. The Hospital Anxiety and
Depression Scale was the most common measure of
mood, while the Chalder Fatigue Scale and the SF-36
were the most commonly used measures of fatigue and
functional impairment, respectively. A complete list of
the measures used to represent each category is available
from the corresponding author. Separate effect sizes
were calculated for each relevant outcome measure. As
a result, up to four effect sizes originated from each
study. In the primary analysis, these effect sizes were
aggregated (assuming dependence), so that each study
contributed a single weighted ES to the overall estimate
of effect.
 

oceanblue

Guest
Messages
1,383
Location
UK
Fatigue
Functional Impairment (generally SF36 by which I presume they mean SF-36 PF subscale)
Depression
Anxiety
Thanks.

Odd they should give anxiety and depression equal weight with fatigue and impairment since the former are not CFS symptoms. Also, I'm pretty sure that CBT has been shown to give medium/large effects on anxiety and depression - so would the effect size have been smaller if just for fatigue/function? This hints it might have been.
although CBT may be a more effective treatment when patients have comorbid anxiety and depressive symptoms.
 

Enid

Senior Member
Messages
3,309
Location
UK
What the b..... h ... is co-morbid - this lot continues to bark up the wrong tree - every illness brings what may appear to be to the ignorant a co-morbid. Of course one is thoroughly ill.
 

Dolphin

Senior Member
Messages
17,567
Thanks.

Odd they should give anxiety and depression equal weight with fatigue and impairment since the former are not CFS symptoms. Also, I'm pretty sure that CBT has been shown to give medium/large effects on anxiety and depression - so would the effect size have been smaller if just for fatigue/function? This hints it might have been.
Yes, it is a bit odd. The paper says Malouff et al. (2008) didn't include them so perhaps they were trying to be different/justify having another meta-analysis on virtually the same data:

Effect sizes:

CBT:
Fatigue: 0.36
Functional Impairment: 0.36
Depression: 0.32
Anxiety: 0.15

GET:
Fatigue: 0.41
Functional Impairment: 0.39
Depression: 0.15 (not significant)
Anxiety: 0.01 (not significant)
 

oceanblue

Guest
Messages
1,383
Location
UK
Yes, it is a bit odd. The paper says Malouff et al. (2008) didn't include them so perhaps they were trying to be different/justify having another meta-analysis on virtually the same data:

Effect sizes:

CBT:
Fatigue: 0.36
Functional Impairment: 0.36
Depression: 0.32
Anxiety: 0.15

GET:
Fatigue: 0.41
Functional Impairment: 0.39
Depression: 0.15 (not significant)
Anxiety: 0.01 (not significant)
Oh, so the effect is bigger without anxiety/depression! Thanks for the info.
 

Dolphin

Senior Member
Messages
17,567
Sentences like the following:
although CBT may be a more effective treatment when patients have comorbid anxiety and depressive symptoms.
can give the impression that they did an analysis and those patients who had comorbid anxiety and depressive symptoms had a different overall response.

Indeed, Knoop
After aggregating several intervention studies,
Castell et al. (2011) found that CBT may be more
effective than GET for patients with comorbid depression
and anxiety. Other patients characteristics that
moderate the treatment response have been identified,
of which having a comorbid medical condition, receiving
or applying for sickness benefit, and membership of
a self-help group are a few examples. All the preceding
variables predicted a poor treatment outcome (Bentall,
Powell, Nye, & Edwards, 2002; Knoop et al., 2007).
Most of the moderators found so far only explain a limited amount of the variance in treatment outcome.
More knowledge about patient characteristics that
moderate the treatment response can be used to
improve the outcome of CBT by developing additional
treatment strategies or selecting those patients who
have a reasonable chance of improving as a result of
receiving CBT.
However, the Castell et al. paper in fact doesn't look at anxiety or depression as moderators. What the sentence is trying to say is if somebody has anxiety or depressive symptoms, there is a good chance those symptoms won't be changed (i.e. scores on scales won't change) by GET.
---
ETA: I've looked at it again, why GET's CI's overlap with 0 (no effect) while CBT's doesn't, there was no statistically significant difference between the two for CBT and GET - which shows of course that the raw numbers still have a value as this hides the numerical differences, etc.
 

Enid

Senior Member
Messages
3,309
Location
UK
Barking up the wrong tree - like scrabbling around for the real in dust whilst science passes them by.
 

Dolphin

Senior Member
Messages
17,567
Any analysis of the criteria used in the studies included in the review?
Yes, effect sizes for studies with different criteria were:
Australian (n=2): 0.16 (not significant)
CDC (n=8) 0.36
Oxford (n=4) 0.53

There was not a statistically significant difference (although I'd say it's hard to achieve with such small numbers).

They didn't look at GET as said there were not enough studies. For GET, 3/5 are Oxford, 2/5 are CDC.

(CDC here means Fukuda rather than the (so-called) empiric criteria)
 

Dolphin

Senior Member
Messages
17,567
Misc. comments, for what they are worth

CFS is characterized by unexplained recurrent or chronic periods
of disabling fatigue, which must have been present
for at least 6 months (Fukuda et al., 1994; Lloyd,
Hickie, Boughton, Spence, & Wakefield, 1990; Sharpe
et al., 1991).
That's it. No mention of other symptoms!!

--------
Although the etiology is subject to debate, there is an increasing
consensus that CFS is multifaceted and heterogeneous
in nature (see Wessely, Hotopf, & Sharpe, 1998, for a
review).
And yet we are expected to have the one treatment approach for this heterogeneous condition.

--------

However, where trials employed elements
from both or other treatments, the trial protocol
was designated according to the treatment most closely
matching the actual interventions employed. For this
reason, two trials published as patient education with
graded exercise and pragmatic rehabilitation (Powell,
Bentall, Nye, & Edwards, 2001; Wearden et al., 2010)
were included in the CBT analysis on the basis that
the intervention incorporated key aspects of the cognitive
behavioral model of CFS, including targeting
beliefs regarding symptoms using alternative, evidencebased
explanations.
So now, not alone are the treatments being called evidence-based, but the
weird and wonder theories they present to patients are also!

---------

Although CBT resulted in significant
improvements in anxiety, this effect size was small
(g = 0.15) and almost nonexistent for GET (g = 0.01).
It is possible that being the subject of a research investigation
may be anxiety-provoking in and of itself. Further,
patients with CFS are prone to frequent relapse
(Whitehead, 2006), the fear of which may also be a
source of continuing anxiety and difficult to disconfirm
during a short treatment period.
This looks like stretching by the authors.

Whether a research situation would be that different from an ordinary CBT or GET program where patients could also be anxious about the treatment, is far from clear.

All the last sentence is saying is that anxiety might be difficult to treat. But the way it's written, one might think they were connecting it with the interventions.
Of course, it could be that the interventions are anxiety-provoking perhaps because of fear of relapsing (which cancels out other anxiety-lessening elements of having time with a therapist, etc) - indeed, perhaps the therapy might have induced a relapse or flare-up at some stage during the period of therapy. Another therapy, like pacing, might be less anxiety provoking.

------

However, the FINE trial, which
utilized a range of nonexpert therapists, returned the
most robust effect size in terms of magnitude and
breadth of the confidence interval (Wearden et al.,
2010). The FINE trial used the same physiological
explanation as Powell and colleagues study. This suggests
that a physiologically, rather than psychologically,
based explanation of chronicity may be viewed as more
credible by patients. Indeed, patients with CFS, as a
group, tend to be resistant to suggestions that their illness
may be psychogenic in nature, which may
be perceived as invalidating (cf. Clark, Buchwald,
MacIntyre, Sharpe, & Wessely, 2002; Moss-Morris &
Petrie, 2000).
There's a bit above this that might make this extract clearer, but I should probably not quote too much.

I find this argument strange, if I understand it correctly. What they seem to be saying is that there was less variety between the different outcome measures in FINE and that this is important and ties it in with Powell et al and Moss-Morris et al studies which had the biggest effect sizes overall. Out of the 21 studies, FINE came 12th (where 1st is best).
 

Snow Leopard

Hibernating
Messages
5,902
Location
South Australia
I don't understand that FINE study comment, are they saying the weak results (breadth of confidence interval LOL) are due to the "physical" explanation provided by the nurses? That comment confuses me..