• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

Dolphin

Senior Member
Messages
17,567
"significance level for ... secondary outcome variables [will be] P = 0.01"

I haven't been in this thread for a while - hope to get back later or tomorrow.

Anyway, some people may remember I was going on about this point (apologies if somebody pointed this out - I haven't caught up):
Results from all analyses will be summarised as differences between
percentages or means together with 95% confidence limits (CL). The
significance level for all analyses of primary outcome variables will be P =
0.05 (two-sided); for secondary outcome variables, P = 0.01 (two-sided)
unless profiles of response can be specified in advance.

Somebody pointed out what they probably mean is the following:
I think they are meaning that they would use one-tailed tests and not two-tailed for those that they could do a priori as opposed to post hoc analysis.
So basically what one is looking at there is p<0.02 for two-tail being equivalent to p<0.01 for one tail (area of 0.01 on each end).

Some of what they claimed were differences would satisfy this criterion. I personally don't think results which don't meet statistically significance should be ignored. But they should make it clearer if they are claiming they are talking about trends. The wording gives the impression anything p<0.05 is statistically significant. And there is no mention of p<0.01 that they mentioned in the published protocol paper anywhere in the final paper.

Table 6 shows other secondary outcomes. At 52 weeks,
participants in the CBT and GET groups had better
outcomes than did participants in the APT and SMC groups
for work and social adjustment scores, sleep disturbance,
and depression (with the one exception that GET was no
diff erent from APT for depression). Anxiety was lower after
CBT and GET than it was after SMC, but not than after
APT. There were fewer chronic fatigue syndrome symptoms
after CBT than there were after SMC. Poor concentration
and memory did not diff er between groups. Postexertional
malaise was lower after CBT and GET than it was after APT
and SMC. 6-min walking distances were greater after GET
than they were APT and SMC, but were no diff erent after
CBT compared with APT and SMC. There were no
diff erences in any secondary outcomes between APT and
SMC groups (webappendix pp 69).
 

Dolphin

Senior Member
Messages
17,567
Central Limit Theorem

The Central Limit Theorem is a funny concept. It doesn't seem particulary intuitive.

I think the best way to understand it is to look at some demos and even play around with some distributions including skewed ones (make one like the SF-36 PF) and learn from it.

Googling "central limit theorem demo" gives links to some demos e.g.
http://www.stat.sc.edu/~west/javahtml/CLT.html

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

http://cnx.org/content/m11186/latest/

Spend 10/20/30 minutes on them and hopefully one might have the concept for life.

Importantly, it does not say that every distribution is normally distributed (it refers to means).
 

Dolphin

Senior Member
Messages
17,567
I can't remember has this been highlighted or not:
One of their references is:
Guyatt GH, Osaba D, Wu AW, et al. Methods to explain the clinical
significance of health status measures. Mayo Clinic Proceedings
2002; 77: 371–83.

Free at: http://www.mayoclinicproceedings.com/content/77/4/371.long

The proportion of patients achieving a particular benefit,
be it a small, moderate, or large difference, is therefore
much more relevant than a mean difference from the
clinician’s point of view and less likely to mislead. To
calculate the proportion who achieve a MID, one must
consider not only the difference between groups in those
who achieve that improvement but also the difference between
groups in those who deteriorate by the same amount
. (we weren't given that proportion (i.e. those who got worse) in the paper when they told us how many went up by 8 points on the SF-36 PF and/or by two points on the Chalder fatigue scale)
One must therefore classify patients as improved, unchanged, or deteriorated. In a parallel group trial, the subsequent calculation is not altogether straightforward, and 1
approach involves assumptions about the joint distribution
of responses in the 2 groups.13 Statisticians are developing
alternative approaches to this problem, several of which are
likely to prove reasonable.58 What is not reasonable is
simply to present mean values without taking the second
step that is necessary for clinicians to interpret clinical trial
results effectively.

Distribution-based methods have, in general, 2 fundamental
limitations. First, estimates of variability will differ
from study to study. For instance, if one chooses the between-
patient standard deviation, one has to confront its
dependence on the heterogeneity of the population under
study. If a trial enrolls an extremely heterogeneous population,
an important effect may be small in terms of the
between-person standard deviation and thus judged trivial.
The same effect size, in a trial that enrolls an extremely
homogeneous population, may be large in terms of the
between-person standard deviation, and thus judged extremely
important. The true impact of the change remains
the same, but the interpretation differs radically
.

There are at least 2 ways to deal with this problem. One
is to choose the variability from a particular population,
such as the standard deviation of a measure when applied to
the general population at a point in time, and always refer to that same measure of variability
. The second is to choose
the standard error of measurement (which we will discuss
subsequently), which is theoretically sample independent.

[..]

BETWEEN-PERSON STANDARD DEVIATION UNITS
The most widely used distribution-based method to date is
the between-person standard deviation. The group from
which this is drawn is typically the control group of a
particular study at baseline or the pooled standard deviation
of the treatment and control groups at baseline. As we have
mentioned herein, an alternative is to choose the standard
deviation for a sample of the general population or some
particular population of special interest, rather than the
population of the particular treatment study under consideration.
An advantage of this approach is that it has been
applied widely in areas of investigation other than QOL
.
In case people forget, they chose the baseline standard deviations rather than population standard deviations.
 

Dolphin

Senior Member
Messages
17,567
Here's a blog article on The Psychologist website (The British Psychological Society)...
It's, surprisingly, quite a balanced and informed article, especially compared to the recent newspaper articles...
There's a couple of errors or contradictions that I've noticed, esp re pacing/APT and re depression/anxiety...
There's a helpful facility for comments.

Fatigue evidence gathers PACE
http://www.thepsychologist.org.uk/blog/11/blogpost.cfm?threadid=1947&catid=48

Thanks Bob.
Yes, the bit from Ellen Goudsmit means it could have been worse.

I dislike the impression it gives that the treatments led to people returning to normal functioning:
A year after the study start, 30 per cent of CBT patients and 28 per cent of exercise patients had returned to 'normal' function based on their self-report scores on measures of fatigue and physical function. This was a superior outcome, described as 'moderately effective' by the authors, compared with patients who received only specialist medical care, 15 per cent of whom had returned to normal function. In contrast to the CBT and exercise groups, just 16 per cent of patients who undertook pacing therapy plus medical care reached normal function at follow-up, no better than the medical care only group.

[..]

In contrast, another BPS member, Chartered Health Psychologist Dr Peter Spencer at Leeds Trinity University College, welcomed the new findings, which he said reinforced results from his own research in the nineties. 'CBT and graded exercise have been shown to be more effective than any other approach in the management of ME/CFS,' he said. 'However, some people and groups find CBT/graded exercise not suited to them. Given this, perhaps the most important finding is that there is no need for a nihilistic approach to this illness... Many patients, using a range of strategies, have managed their illness and, like me, have made a full recovery.'
This study didn't show the people returned to normal functioning. As has been said plenty of times, a SF-36 PF score is not normal functioning.

I don't know much about Dr. Peter Spencer (wonder is that the correct name?) - I wouldn't mind putting him on a threadmill for three days in a row and then give him some cognitive tests and see if he's completely normal.

I functioned at a high level with this illness for a few years. But I wasn't 100% well. Nor do I see that GET or CBT regularly leads to this.
 

Dolphin

Senior Member
Messages
17,567
oceanblue wrote: When Bleijenberg & Knoop said PACE had used a 'strict definition of recovery' it was because they incorrectly thought that PACE has used a healthy population to define 'within the norm'. Which is pretty unimpressive in an editorial.
Indeed. Bad peer review? The editorial even used the word "recovery", not "normal" as PACE does. A misunderstanding or typo on their behalf? Or a lack of fact checking after a naive assumption that the PACE authors would stick to the protocol? Wishful thinking? Public relations or spin from comrades?
Editorials generally aren't peer-reviewed as far as I know (editors may look over them)

e.g.
Pragmatic rehabilitation for chronic fatigue syndrome.
Moss-Morris R, Hamilton W.
BMJ. 2010 Apr 23;340:c1799. doi: 10.1136/bmj.c1799. No abstract available.

Provenance and peer review: Commissioned; not externally peer reviewed.
The SF-36 is used all the time in CFS research especially the SF-36 PF subscale. Bleijenberg has been in the field 20 years. He was part of the panel reviewing the CDC criteria in 2003 that recommended its use. He has used it in lots of his studies.
Indeed as oceanblue pointed out, it was used as part of the "full recovery" definition in:
Is a full recovery possible after cognitive behavioural therapy for chronic fatigue syndrome?

Knoop H, Bleijenberg G, Gielissen MF, van der Meer JW, White PD.

Psychother Psychosom. 2007;76(3):171-6.
They would be very interested in what threshold could be used e.g. what could have been used in that paper.
I don't think they can be given the benefit of the doubt here.
 

Dolphin

Senior Member
Messages
17,567
This was a very controlled and clean cut trial which is unlikely to reflect what happens in the real world. So yes, I would imagine the efficacy-effectiveness gap is significant for CBT and GET in PACE. The same could be said for SMC, how many patients in the NHS receive STFU "therapy" and GTFO "therapy" rather than "standardised specialist medical care"? On the other hand, SMC probably didn't involve any exotic treatments some patients may try and find effective. White et al have argued that the poor results and negative reports from patient surveys are the result of badly applied therapy. I think Dolphin(?) has pointed out somewhere that many of these patients went to professional services. This may support the existence of a efficacy-effectiveness gap and is very concerning if CBT and GET will now be further "rolled out" into the NHS.
Yes, no statistical difference in the percentages made worse by GET who did it under an "NHS specialists" (111/355=31.27%) vs any other situation (70/212=33.02%): see slide 9 at: http://afme.wordpress.com/5-treatments-and-symptoms/
 

Dolphin

Senior Member
Messages
17,567
I went back to that graph of the distribution of SF36 scores for the general uk adult population (30% of whom are 65 or over) - and where the PACE results fit on it - and tried to calculate/estimate some numbers.
View attachment 5226
For example, the baseline Sf-36 scores of 38 corresponds to about the bottom 10% of the UK population

The control SMC group at 52 weeks scored around 50, corresponding to the bottom 13% of the population.

The CBT/GET groups at 52 weeks scored about 58, corresponding to the bottome 15% of the population.

So the net effect of CBT or GET, after 1 year, was to move particpants from around the bottom 13% of SF-36 scores to around the bottom 15%.

Let's party.

22% of the UK population used in this study reported a long-term illness, though the authors of the relevant study (Bowling) say the face-to-face interview method used probably leads to under-reporting of ill-health.

This proably isn't an entirely fair way of presenting things, but it's at least as fair as the 'within the normal range' stunt pulled by the PACE authors.
Excellent way of putting it. (And that's giving them 58 as the 15th percentile of the adult population - I'd say it could be a bit lower based on figures I've seen).

Percentiles are used quite a lot in some areas e.g. in school we did differential aptitude tests to help us decide career choices. Results of the SATs in the US - used for college entry - are often given as percentiles. Indeed GREs (used for grad school entry) and other tests also use percentiles. I think they are used in other areas e.g. height charts for children. So they would not be an alien concept to lots of people.

The interesting thing about CFS cohorts currently is that other disabling conditions are generally exclusions so they should probably be healthier than the general population.
 

Dolphin

Senior Member
Messages
17,567
Apologies for repeating the same long post in two topics, but I just realized I posted this, meant for this thread, in one of the NYT threads about Tuller instead:

OK, since I've brought up questions about the placebo effect and the nocebo effect, I wanted to take a look at the APT arm protocol to see what was really in there. I had, inexcusably, been relying on some chance comment someone had made about how rigid the PACE version of "pacing" was, and I think I may have even been the one to coin the term "the evil version of pacing."

On the face of the the APT used in PACE doesn't really look *that* bad or that radically unlike what we would understand as "pacing." But even on my first quick read I definitely saw some poison pills in there.

This is the one arm of the trial which claims to use a "pathological" model of ME, i.e. that it is a physical disease. The manual is sprinkled with familiar-sounding quotes from people with ME about how they learn not to do two major errands in one day, for instance, and how they learn to stop *before* they feel really exhausted, stuff like that.

To be noted, though: there is constant emphasis on how this approach should be communicated to patients as "not a cure", and that the best it can do for you is "create the conditions for natural recovery to occur." (So even this model doesn't contemplate ME as a disease that might be incurable, or that an individual might never recover "naturally.") The quotes from patients, though *we* know them all to be accurate representations of how pacing works - I gotta tell you those quotes sound like a lot of doom and gloom to the uninitiated. When I was first ill and would read statements like "don't do the laundry and get the groceries on the same day," I was NOT ready to hear that; in fact it would set off a fury of grief that I had to accept such awful limitations on my life, when I used to do dozens of things each day. The grief process involved in accepting that is a major, major undertaking - and these poor people weren't actually getting any emotional support about it, or any real reason to hope (say, by being told that research is ongoing and someday there might be better treatments available. Because as far as the authors of the PACE trial are concerned, all the necessary research has been done & *they* know the cure already - they're deliberately giving people in this arm a treatment that they themselves think is ineffective.)

On the other hand, the GET and CBT arms are filled with positive messages about self-empowerment, encouraging you not to think of yourself as really (or permanently) limited, "helping" you to "identify" your bad habits that are perpetuating this "vicious cycle" of fear of activity/deconditioning, etc., telling you over and over again that you can overcome this vicious cycle and improve your condition.

And then measure outcomes subjectively, after a good year of inculcating the proper attitude in each group of patients about what they can expect from their therapy.

Now think about the cohort issues: we've got a majority of patients in the trial who would never meet CCC, an unknown number of whom have never even experienced true PEM, a large number of whom probably have primary depression or some other fatiguing condition. Would being trained in "pacing" do these people any good at all? When the therapy is delivered with such a strong underlying message that "this probably won't help you improve at all"?

Even if you somehow accidentally got into this trial with real M.E. (it would have to be a mild case), the expectation that there is some "natural recovery" that might *possibly* occur would certainly lead to disappointment with the APT treatment. As far as I understand pacing, it's not going to be a cure or even make me feel dramatically *better* in any way; what it does is cut down on the worst of the suffering. Not an effect you'd feel if you weren't acutely suffering going in; and those folks were pretty well screened to eliminate anyone who was really suffering physically. And if you actually *were* fatigued because you were depressed and deconditioned, of *course* you wouldn't feel better after your 52 weeks with Eeyore being told to lie down and think of England. And you'd be pretty mad that you didn't even get any "natural recovery."

The folks hanging out with Tigger in the other two arms, where everything is wonderful and the power of positive thinking rules all, are being encouraged to believe they feel better. And, of course, if they really had been deconditioned and depressed, they might feel a bit better, especially in the GET arm - and they'd have that nice "sense of control" that they accomplished it through their own good efforts.

OK, guesses as to which group(s) get the placebo effect and which group gets the nocebo effect?

This is based on a very quick read and I'll have to delve deeper to flesh out these thoughts some more - some things still strike me as odd, such as the pacing group being *forbidden* to use heart rate monitors (?) and rely only on their "perception" of how fatigued they felt ... and the fact that some positive aspects of real pacing seem to have snuck their way into the CBT arm rather than being put in the APT arm.
Lots of good observations there, IMO. Too tired/busy to go through them individually.
 

Dolphin

Senior Member
Messages
17,567
Seeing as I was the one who raised the issue of statistical purity, I'll reply.

There is no argument or disagreement here.

The fact that the PACE authors have used parametric statistics inappropriately is undeniable and can and should be highlighted. It logically follows that any analysis derived has no validity whatsoever. However this is a point best made by a professional statistician. I believe one of the PACE team is a medical statistician and would be the one to respond to such suggestions. I suspect the ensuing argument would be in the nature of 'oh no we didn't - oh yes you did etc' and they could probably pull some 'we performed a log transformation on raw scores to approximate a normal distribution argument or similar waffle (leaving aside that in doing so they would obliterate the very nature of the underlying distribution - but thats another matter). Whether of not highlighting this failing will convince anyone of the underlying 'bad science' is anyone's guess.

The other approach, which we seem to be taking here, is to set to aside this legitimate criticism, and to work from the data provided and the underlying statistical assumptions as they appear in the published paper. Accepting that they are rubbish, they are the basis on which the PACE authors are basing their limited success. Even accepting their erroneous assumptions, its relatively easy to point out startling deficiencies in their analyses, not least the pathetically low baselines set for 'normal ranges'.

So my last word on this is, there is a very valid point to be made on the ropey stats which invalidates all their results. But this point may be lost on many (particularly policy makers). There are also many points to be made on the data as presented which are probably likely to be more meaningful to the average observer. Equating 'normal' functioning as being that of a 65+ year old is a fairly damning thing to highlight.

If the PACE authors choose to talk about means and SD's then so be it.
Thanks for that.

And on the last point: they didn't use those means and SDs on an objective test, the 6 minute walk data for which we know how 65 year olds do. We weren't given how many did get up to the 600ms (sort of results 65 year olds score) but if around 10% did, the rest had results similar to the other groups. And of course, there is no talk that the recovered/normally functioning group is around 10% in size.
One can also add into the mix that 31% of the GET group (the biggest percentage of the 4 arms) didn't do the 6MWT. On average, the sort of people who wouldn't do it one would think would score lower than those who would do it esp. as they could have done the test twice before.

But going on the theme of what you said, we should use what we have and can criticise them on the SF-36 PF scores.
 

Dolphin

Senior Member
Messages
17,567
Like most people, this is probably beyond my understanding, but while trying to get to grips with biostatistics (studied with a textbook chosen expressly because it promised jokes) I came across this, that might be relevant:

If the population of all subscribers to the magazine were normal, you would expect its sampling distribution of means to be normal as well. But what if the population were non-normal? The Central Limit Theorem states that even if a population distribution is strongly non-normal, its sampling distribution of means will be approximately normal for large sample sizes (over 30). The Central Limit Theorem makes it possible to use probabilities associated with the normal curve to answer questions about the means of sufficiently large samples.

But since I'm out of my depth, it might not be relevant too. However, I thought your contributions deserved some reply; I do try to read your posts but, apart from the good gags, I don't really grasp them.
As I alluded to earlier, this is only relevant for means. So it makes sense for example to have symmetrical CIs for the means of each group in each arm.

But not all distributions of individual scores are normal or even close to it.
 

Dolphin

Senior Member
Messages
17,567
Marco said:
SF-36 : Norms, skewness and 'top box' scores in a surgical population.

I'd completely forgotten about this paper and apologies if it has already been posted.

It might be some help with letters :

"A review2 of surgical QoL studies has
found that there were several deficiencies
in the conduct of these studies. One of the
most common problems was inappropriate
statistical analysis. The proper statistical
analysis of data is essential in interpreting
the results of any study.3 Commonly,
data from the SF-36 have been presented as
means with standard deviations or standard
errors of the mean. The basic assumption
of these studies is that the data follow
a normal (gaussian) distribution, having a
“bell-shaped” curve. However, many of
these studies did not perform the statistical
tests4 needed to determine if, indeed, the
data follow the normal distribution necessary
to use this type of statistical analysis."...
http://archsurg.ama-assn.org/cgi/reprint/142/5/473.pdf
This fellow does know what he is talking about, but lacks the collection of merit badges required for weight of authority. Top box score analyses are not accepted standards in any field I'm aware of, however. As a comment suggested, there are many other non-parametric alternatives.

Unfortunately, the number of studies with the same fundamental flaw is large enough for incompetent researchers to outvote objectors. Also, consider that these studies were based on surgery, associated with presumption of organic causation. Had they reviewed psychological literature the state of the art would have been considerably worse.

Even after his presentation, we have one response to the talk indicating someone (McCarthy) fully intends to keep doing what he has been doing. There is no apparent awareness that the behavior he claims to see in data could be the result of the sampling process instead of the population being sampled.

Meanwhile, someone in parliament should ask what will the UK do about 80,000 predicted zombies.:angel:
I think one can argue that "top box" scoring could have a place in health.
One could argue that being in full health should mean that one would score 100/100 on the SF-36 PF (or 11 or less on the Chalder Fatigue Scale). Why should one include the "norm" of the population which includes a huge range of people.

It wouldn't have the be the only measure used but could be one measure. If they're saying something can lead to recovery/full recovery, let's see the evidence. Isn't there some famous quote like grand claims need grand evidence or something along those lines.*

One could them compare the proportion who scored the optimal score compared to the proportion in the population.

* I remember QED in the 1980s (BBC1 show) (think it was that show - it may have also shown up in Arthur C. Clarke's wonders program) where they had a sceptic on about spoon bending and the like (think Uri Geller, for example). The sceptic had a steel bar in a glass box. There was a big prize for anyone who could bend it.
 

Dolphin

Senior Member
Messages
17,567
Thanks for your detailed responses, biophile.

If I can nit-pick for a moment.
I agree. 28% of the GET group reported feeling "much better or very much better". If even half of these were "recovered" (PACE's original definition in the protocol) and/or a healthy normal distance in the 6MWT ie 600m, the authors would be proudly displaying these figures.
The 28% related to those with SF-36 PF scores of >=60 and Chalder Fatigue Questionnaire scores of >=18.

The percentages across the four groups for
"much better or very much better"
were:
APT: 31%
CBT: 41%
GET: 41%
SMC: 25%
(See Table 5)




Snow Leopard wrote: Yes, I agree and that is why I believe a 6 minute walking test is hardly objective if it does not consider the overall impact on symptoms and activity levels over the next few days.

Exactly. The GET group may merely be pushing themselves more and paying a heftier one off price for a one off test that was absorbed into the statistics on "non-serious adverse events", "serious adverse events", "serious adverse reactions" etc.
I am a bit confused by what you are saying here?
If you are saying that any side-effects from the test would have been picked up in "non-serious adverse events", "serious adverse events", "serious adverse reactions" etc., I'm not sure that is correct for the 52 week test which was at the end of the trial.
 

Dolphin

Senior Member
Messages
17,567
Dolphin wrote: And to repeat something that has been said at least once, if not more, the changes figures of 8 (SF-36 PF) and 2 (CFQ) used for clinically useful difference are artificially small because they are based on the SD which was artificially small because the same items were used for entry criteria.

Do you mean that the SD is low because of the cut-off points skew the distribution?
I mean because they are entry criteria. Take for example weights of adult females in the population - they would have a fairly wide spread. Then say for a trial one only wants adult females who are 8 stone or less (8 stone = 112lbs = 50.8 Kg). If one says that half of 1SD is sufficient for a clinical meaningful result (for some regime that increases weight?), this might be quite a small amount because they're already bunched together.

Dolphin wrote: Given the model for GET (i.e. symptoms/deconditioning are temporary and reversible), I think there should be an obligation on them to report the figures or otherwise one doesn't know if the model has been tested:

oceanblue wrote: I think it would be a good idea if someone - ideally one of the ME charities - formally wrote to the authors asking for publication of data promised in the protocol but curiously absent in the paper e.g. recovery rates. If that fails, there would then be the option of going to the MRC who funded to the trail to a massive extent and whose Trial Steering Group approved the protocol. It might be tricky for the MRC to turn down such a request.

Yes, shouldn't a publicly funded trial give open access to its raw data?!
Looking for "raw data" would be a request they might turn down. But the spirit of what you are saying could be true - in this case, we're looking for one of the secondary outcome measures. It shouldn't be sensitive information.
 

Dolphin

Senior Member
Messages
17,567
Problem with self-reports acknowledged in a CBT study on MS

An additional limitation was that outcome assessment in
this study depended on self-rated outcome measures. No ob-
jective measures exist for subjectively experienced fatigue, so
we chose reproducible measures that are sensitive to change.
However, self-reports are amenable to response bias and so-
cial desirability effects
. Future studies could also assess more
objective measures of change such as increases in activity
levels and sleep/wake patterns using actigraphs or mental
fatigue using reaction time tasks.


How refreshingly honest.

from A Randomized Controlled Trial of Cognitive Behavior Therapy for Multiple Sclerosis Fatigue
Thanks oceanblue.

Look at the authors:
Address correspondence and reprint requests to Dr. Rona Moss-Morris, School of Psychology, University of Southampton, Highfield Southampton, SO17 1 BJ, UK. E-mail: R.E.Moss-Morris@soton.ac.uk

Trudie Chalder, PhD, DipBehavPsych
Trudie Chalder was one of the PIs (Principal Investigators) for the PACE Trial.

Rona Moss-Morris has done a GET trial on CFS and co-wrote editorial on FINE Trial.

I doubt I've read either of them say anything like that in all their articles on CFS. Certainly they're not regularly mentioned. I mainly read ME/CFS articles and I'm not sure I've heard the phrase "social desirability effects" in any paper and not sure about "response bias" either - it's certainly not regularly mentioned.

-------
The following is interesting and contrasts with the PACE Trial:
To define clinically significant improvement in fatigue, a normative approach
was used (27), which assessed whether patients’ fatigue levels after treatment
were equivalent to fatigue levels in healthy people. In order to do this, we
collected Fatigue Scale data from a matched healthy comparison group recruited
during baseline assessments. The 72 MS participants were matched to 72 healthy
participants by age, gender, and ethnicity. Healthy participants were screened via
interview to exclude the presence of an existing chronic mental or physical illness.
The details of these data are presented in the Results section.

[..]

To assess clinically significant improvement, fatigue scores
for the MS samples were compared with baseline data from the
matched healthy comparison group at each time point. These
comparisons are represented graphically in Figure 2. The mean
Fatigue Scale score for the healthy comparison group was 12.49
(SD 2.86).
Figure 2 shows that there were large significant
differences between fatigue reported by the healthy comparison
group at the start of treatment and the fatigue in the CBT
(t (107) 11.99; p .001) and fatigue in the RT group
(t (109) 11.21; p .001). Interestingly, at the end of treatment,
the CBT group reported significantly lower levels of fatigue
compared with the healthy comparison group normative score
(t (107) 6.67; p .001). This trend for lower fatigue than
healthy participants was maintained at 3 (t (107) 4.48; p
.001) and 6 months follow-up (t (107) 2.51; p .01).
Fatigue levels for the RT group at the end of treatment were
equivalent to those of the matched healthy comparison group
(t (109) 1.32; p 0.19). At 3 months follow-up, their
fatigue was significantly less than the healthy participants
fatigue levels (t (109) 2.08; p .05), while the two
groups had similar fatigue severity scores at the last follow-up
point (t (109) 0.14; p .89).

Discussion:
Our data also suggested that the improvements in fatigue
were clinically significant. Fatigue scores for the CBT group
improved after treatment to a level lower than fatigue scores
reported by a matched healthy comparison group, and this
pattern was maintained at both follow-up points. The RT
group improved to a level similar to that of the matched
healthy comparison group, and this pattern was maintained at
the last follow-up point. These results support the findings that
CBT was more effective in improving MS fatigue than RT and
also highlight that the RT group improved their fatigue to a
level comparable to a healthy group.

If one looks at Figure 2, they don't use the mean plus 1 SD but simply the mean.

They didn't use a healthy comparison group in the PACE Trial just a group who attended the GP. The figures for that group were higher (worse): Mean: 14.2 (SD: 4.6).

The healthy comparison group would have been pretty good for the PACE Trial. Average age 44.90 (+/- 9.59) 75% female.
If they'd used mean+ 1 SD (not done for MS trial) for PACE Trial, it would have been 15 rather than 18 as a threshold. A bigger difference when 11 is the neutral score.

Also the comparison group only came into the PACE Trial for the "normal functioning""normal fatigue" bit.
For clinically useful difference, it was:
A clinically useful difference between the means of
the primary outcomes was defi ned as 05 of the SD of
these measures at baseline,31 equating to 2 points for
Chalder fatigue questionnaire and 8 points for short
form-36.
Compare that to the MS trial where they were looking to see whether the scores got down to 12.49 for the "clinically significant improvement" (possibly they would have used this score + 1 SD if the results had been worse).
 

oceanblue

Guest
Messages
1,383
Location
UK
@oceanblue
"For example, the baseline Sf-36 scores of 38 corresponds to about the bottom 10% of the UK population

The control SMC group at 52 weeks scored around 50, corresponding to the bottom 13% of the population.

The CBT/GET groups at 52 weeks scored about 58, corresponding to the bottome 15% of the population.

So the net effect of CBT or GET, after 1 year, was to move particpants from around the bottom 13% of SF-36 scores to around the bottom 15%."

Excellent way of putting it. (And that's giving them 58 as the 15th percentile of the adult population - I'd say it could be a bit lower based on figures I've seen).

Percentiles are used quite a lot in some areas e.g. in school we did differential aptitude tests to help us decide career choices. Results of the SATs in the US - used for college entry - are often given as percentiles. Indeed GREs (used for grad school entry) and other tests also use percentiles. I think they are used in other areas e.g. height charts for children. So they would not be an alien concept to lots of people.

The interesting thing about CFS cohorts currently is that other disabling conditions are generally exclusions so they should probably be healthier than the general population.
I hadn't realised percentiles were so widely used, that's good.

Very good point about the physical exclusions used by the trial changing who the reference population should be. It just reinforces the point that the reference population should be healthy individuals ie excluding those with chronic ill health, as proposed by Knoop (& White, at the time).
 

oceanblue

Guest
Messages
1,383
Location
UK
I can't remember has this been highlighted or not:
One of their references is:

Free at: http://www.mayoclinicproceedings.com/content/77/4/371.long

The proportion of patients achieving a particular benefit,
be it a small, moderate, or large difference, is therefore
much more relevant than a mean difference from the
clinicians point of view and less likely to mislead. To
calculate the proportion who achieve a MID, one must
consider not only the difference between groups in those
who achieve that improvement but also the difference between
groups in those who deteriorate by the same amount.
(we weren't given that proportion (i.e. those who got worse) in the paper when they told us how many went up by 8 points on the SF-36 PF and/or by two points on the Chalder fatigue scale)
I know I'd meant to flag that up when I read it, but not sure I managed to post it here. Has anyone really got to the bottom of the data for harm/deterioration in the trial? They seem incomprehensible to me, and I wondered if that was just as the authors had planned.
 

oceanblue

Guest
Messages
1,383
Location
UK
I think one can argue that "top box" scoring could have a place in health.
One could argue that being in full health should mean that one would score 100/100 on the SF-36 PF (or 11 or less on the Chalder Fatigue Scale). Why should one include the "norm" of the population which includes a huge range of people.

It wouldn't have the be the only measure used but could be one measure. If they're calling something can lead to recovery/full recovery, let's see the evidence. Isn't there some famous quote like grand claims need grand evidence or something along those lines.
Nice quote.

Re Top Box, I think 100/100 might be a bit too high. I've tried to estimate the data from the graph in the Bowling study - only 57% scored in the top box, and it looks like the top box is scores of 95 or 100.
mySF36.jpg
 

oceanblue

Guest
Messages
1,383
Location
UK
Look at the authors:
Trudie Chalder was one of the PIs (Principal Investigators) for the PACE Trial.
Rona Moss-Morris has done a GET trial on CFS and co-wrote editorial on FINE Trial.
Yes, I'm sure Professors Chalder and Moss-Morris didn't mean to be so helpful or so reasonable. Some kind of collective brainstorm?
The following is interesting and contrasts with the PACE Trial:
The mean Fatigue Scale score for the healthy comparison group was 12.49
(SD 2.86)

If one looks at Figure 2, they don't use the mean plus 1 SD but simply the mean.

They didn't use a healthy comparison group in the PACE Trial just a group who attended the GP. The figures for that group were higher (worse): Mean: 14.2 (SD: 4.6).

The healthy comparison group would have been pretty good for the PACE Trial. Average age 44.90 (+/- 9.59) 75% female.
If they'd used mean+ 1 SD (not done for MS trial) for PACE Trial, it would have been 15 rather than 18 as a threshold. A bigger difference when 11 is the neutral score.

Also the comparison group only came into the PACE Trial for the "normal functioning""normal fatigue" bit.
For clinically useful difference, it was:
Compare that to the MS trial where they were looking to see whether the scores got down to 12.49 for the "clinically significant improvement" (possibly they would have used this score + 1 SD if the results had been worse).
Very interesting to see a mean and SD for a healthy population. This MS study is another using a healthy population for reference; there really doesn't seem to be a precedent for PACE to use a general population.
 

oceanblue

Guest
Messages
1,383
Location
UK
@ biophile Dolphin wrote: And to repeat something that has been said at least once, if not more, the changes figures of 8 (SF-36 PF) and 2 (CFQ) used for clinically useful difference are artificially small because they are based on the SD which was artificially small because the same items were used for entry criteria.

I mean because they are entry criteria. Take for example weights of adult females in the population - they would have a fairly wide spread. Then say for a trial one only wants adult females who are 8 stone or less (8 stone = 112lbs = 50.8 Kg). If one says that half of 1SD is sufficient for a clinical meaningful result (for some regime that increases weight?), this might be quite a small amount because they're already bunched together.
According to Figure 1, only 33 people were excluded (5% of total recruits) on entry criteria of bimodal CFQ<6, as opposed to over 300 for SF36>65 (50% of total recruits). So potentially that's a very big effect on SD for SF36. On top of this is the exclusion of those too ill to participate which probably has a huge effect (though maybe not for CFQ as the ceiling effect means most of those excluded as too ill would have scored max of 33 anyway).