• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

Angela Kennedy

Senior Member
Messages
1,026
Location
Essex, UK
Off topic - sorry - but oceanblue's post in this thread has got me back up on another hobby-horse...



That would be perfect, please somebody do give this a go and send him our best analysis, but I hold out no hope at all there myself.

It sounds like a pipe-dream to me. It really, really would be the impact we're looking for. It is exactly the place where the story should appear. It is exactly the newspaper that should be 'guarding' us. Goldacre's purported agenda of exposing bad science is exactly where we would expect to turn for support here.

Unfortunately, it's not about that at all, as far as I can see. He seems more interested in attacking those things against which he's prejudiced and defending the scientific status quo from anything "a bit weird". He seems to have the same agenda about neuroscience, mind/body etc as the Wessely crowd. He seems to me to be a part of the same mob. They seem to all be joined at the hip.

This is near enough the epicentre of the problem for us in the UK IMO. Who would we expect to defend us if not the Guardian? That's exactly where I expect to find our support - but instead I find this man, who looks to me like nothing but a wolf in sheep's clothing who is busy pulling the wool over everyone's eyes and merely pretending to be the very thing we so desperately need.

What hope for us, and what place in the Guardian, for a man who hosts an online forum (don't buy into the line of 'nothing to do with me guv, I just host it under the name of my book') with a main section dedicated to:

"The Great British Sport of moron-baiting"

Is this Guardian journalist morality in the post-modern era? Should it have been renamed "New Guardian" when it got its makeover, perhaps?

From everything I've seen, Goldacre and his army go after whatever they deem 'bad science' with aggression, rudeness, disrespect and a massive dose of swearing to make everybody laugh and switch off their brains. I'm judging mainly by the BS forum, some Goldacre videos, and Martin J. Walker, so my sources are limited I admit, but from everything I've seen and heard on the forum especially, they determine what is 'bad science' primarily in accordance with their prejudices, their assumptions, and what they 'just know' to be absolute rubbish. The confirmation bias in their way of thinking is enormous; their arrogance breathtaking; their rudeness and disrespect disgusting to me. I see almost no genuine critical thinking, thoughtful or sober scientific critique in the movement - just a crusade for the concept of science as they understand it, and violently against any other way of looking at the world. And the movement is so easily exploited by vested interests that even the really good science, emerging science, and human-focused science gets trampled underfoot. That's the way the movement he's part of looks to me.

BUT: LET HIM PROVE ME WRONG!

If Goldacre's column - or any other Guardian journalist indeed - addresses the PACE trial and rips it to shreds, exposes how CBT/GET is a con, how any attempt at biomedical research into ME/CFS has been suppressed, and how a group of severely ill people have been brushed under the carpet to save money, for decades....well I'll believe it when I see it...and I'll be delighted to have been proved wrong and made to look foolish. And I'll celebrate with the loudest whoops and cheers of delight - because we will finally be cooking with gas.

I might even start buying and reading the thing again; until then, I simply don't trust it any more.

If there really still was a 'Guardian', would we still be in the mess we're in now?

http://www.scribd.com/doc/8401751/C...n-Goldacre-Quackbusting-and-Corporate-Science

Nice post Mark. Summarises the situation beautifully.

Me too- would LOVE to be proven wrong about Goldacre in the way you fantasise - but I won't be holding my breath...
 

oceanblue

Guest
Messages
1,383
Location
UK
Remember that a neutral value is 11 - that's if one answered same as usual to each of the 11 questions.
Very helpful point, and fits well with the reported "No disease/current health problem 11.2". Presumanbly the SD for this is likely to be smaller than the 4.0 reported for the whole sample, because it's a more homogenous group. Even so, it would lower the mean + 1SD threshold from 18 used in the trial to 15, thereby reducing the number of 'recovered patients'.

However, it is possible that Norwegians are a bit healthier than us Brits! This could account for the overall mean being lower than in the UK (12.2 vs 14.2) - though it's quite a big difference. I don't know if there are any figures on the relative health of Norwegians and Brits?

Interestingly, 17.6% of the CFS patients has a score of a score of 18 or less before they were treated. This is the score used in the final PACE Trial paper to represent normal fatigue levels or was described in the Knoop & Bleijenberg editorial as representing recovery.

Fascinating. i wonder what percentage of PACE trial participants had a score of 18 or less at the start of the trial? The answer should, of course, be zero for the Recovery definition to make sense. In fact there should be plenty of clear water between any patient at entry and 'recovered'.
 

anciendaze

Senior Member
Messages
1,841
overview for response

Afraid I have not done my part in analyzing details of the protocol on this thread. I can only stomach so much turgid prose before becoming grumpy (as last night) or collapsing.

Generally speaking, the fine detail makes sense if you buy into the absurd context in which it appears. After reading Dr. Hooper's comparison of criteria used at different times, and gaps between stated intent and definitions actually present in the references cited, I have concluded we don't even know which time-variable definition was in use when particular data were collected.

The suggestion concerning acquiescence bias is reasonable, but there are so many documented biases concerning authority in the literature I'm not sure which to emphasize. The huge absurdity in this aspect is the use of trained experts to argue with the patient about what is going on in the patient's own head. No evidence for mind reading has been presented. While some patients had recognized mental problems, the majority had no identified problem outside this diagnosis. Dr. White has even said, in effect, doctors don't know what is going on inside most patients.

It is no accident that every portion of the trial was predicated on the belief that an authority figure is required to tell the patient how to get better. No control group in which patients advised other patients was used, for excellent reasons. The whole thrust of this business has been aimed at preserving authority. In such a situation, quibbles about a little statistical bias seem trivial.

The point I would bear down on is that the study shows a large proportion of the patients simply did not respond to a year of treatment -- a substantial effort requiring persistence. Moreover, nothing presented in the study differentiates these patients except their lack of response. The authors don't know which patients they can treat, by any studied means, bringing the definition of the cohort into question.

The largest selection effect looks to be refusal to participate in the study, caused by mistrust we all appreciate. Here is my latest semi-professional prose concerning that:

The total of those self selected out of the study, either declining to participate or failing to complete, is comparable to those completing the study. This is itself a substantial selection effect which will not disappear unless confidence in the NHS and MRC becomes considerably more robust. Such a selection effect can produce evidence of improvement in the absence of any effective treatment.

One final comment concerning that Lancet cover picture. Last night I imagined this being waved around in medical school. "Class, what's the take-home message from this issue of The Lancet? You, Poindexter."

P: "These people are real losers."
 

Dolphin

Senior Member
Messages
17,567
Very helpful point, and fits well with the reported "No disease/current health problem 11.2". Presumanbly the SD for this is likely to be smaller than the 4.0 reported for the whole sample, because it's a more homogenous group. Even so, it would lower the mean + 1SD threshold from 18 used in the trial to 15, thereby reducing the number of 'recovered patients'.

However, it is possible that Norwegians are a bit healthier than us Brits! This could account for the overall mean being lower than in the UK (12.2 vs 14.2) - though it's quite a big difference. I don't know if there are any figures on the relative health of Norwegians and Brits?

Fascinating. i wonder what percentage of PACE trial participants had a score of 18 or less at the start of the trial? The answer should, of course, be zero for the Recovery definition to make sense. In fact there should be plenty of clear water between any patient at entry and 'recovered'.
Thanks.

Of course, With the physical functioning scale, there is a more obvious overlap: people with scores of 65 or less could get in but in their recovery-type/normal functioning defintion, a score of 60 fits. I really hope that point is published in some letter. It will make them look very dodgy.
 

Dolphin

Senior Member
Messages
17,567
The point I would bear down on is that the study shows a large proportion of the patients simply did not respond to a year of treatment -- a substantial effort requiring persistence. Moreover, nothing presented in the study differentiates these patients except their lack of response. The authors don't know which patients they can treat, by any studied means, bringing the definition of the cohort into question.
Unfortunately, the authors have promised us this in some form:
(from main paper) We plan to report relative cost-eff ectiveness of the
treatments, their moderators and mediators, whether
subgroups respond diff erently, and long-term follow-up
in future publications.

(from protocol paper)
Predictors
1. Sex

2. Age

3. Duration of CFS/ME (months)

4. 1 week of actigraphy [18] (as initiated at visit 1 with the research nurse)

5. Body mass index (measure weight in kg and height in metres)

6. The CDC criteria for CFS [1]

7. The London criteria for myalgic encephalomyelitis [40]

8. Presence or absence of "fibromyalgia" [41]

9. Jenkins sleep scale of subjective sleep problems [37]

10. Symptom interpretation questionnaire [34]

11. Preferred treatment group

12. Self-efficacy for managing chronic disease scale [32]

13. Somatisation (from 15 item physical symptoms PHQ sub-scale) [35]

14. Depressive disorder (major and minor depressive disorder, dysthymia by DSMIV) (from SCID) [30]

15. The Hospital Anxiety and Depression Scale (HADS) [38] combined score

16. Receipt of ill-health benefits or pension

17. In dispute/negotiation of benefits or pension

18. Current and specific membership of a self-help group (specific question)
It usually gets worse here e.g. being in receipt of a disability payment is bad for one's health is one that sometimes comes out.
 

Sean

Senior Member
Messages
7,378
It is no accident that every portion of the trial was predicated on the belief that an authority figure is required to tell the patient how to get better. No control group in which patients advised other patients was used, for excellent reasons. The whole thrust of this business has been aimed at preserving authority.

Hard to come to any other conclusion. Basically a last ditch, power grabbing, face saving exercise, hence their extravagant public relations blitz for these PACE results, and the silence from them for the FINE results (which couldn't possibly be spun in their favour).

It usually gets worse here e.g. being in receipt of a disability payment is bad for one's health is one that sometimes comes out.

This claim (along with being a membership of a patient group) particularly bugs me.

Being on a pension, and (independently) being a member of a patient group, is exactly what you would expect in a group of people who are very sick over a long period, are not being given effective treatment, and are being socially, politically and financially marginalised.

What is the rate of membership of patients groups in disorders like MS or Parkinson's disease, and the correlation between disease severity and pension receipt/patient group membership? And how much difference does it make (particularly to patient group membership) if the disorder is also a seriously disputed one? I do not see these being factored into their claims.

The reality is that ME/CFS patients who are receiving welfare benefits (and probably also those who are members of patient groups) are already among the sickest and least employable, and even using the most generous possible definition of recovery they are the least likely to 'recover' to the point where they become reasonably employable, no matter how much CBT/GET they do.

To the extent that being on a disability pension is bad for one's health - and I have no doubt it is not optimal for it - it is in no small part because some people in power and the broader community try hard to make sure that getting, keeping, and surviving on a pension is as difficult, unpleasant, stigmatising, humiliating, degrading and isolating an experience as possible, including sometimes denying us the basic resources to maintain what health, functioning and basic dignity we have left.

It is a particularly nasty self-fulfilling ideological prophecy: 'We believe that being on a disability pension is inherently a Very Bad Thing. And we are going to make damn sure it is.'



We can turn the question around: Is being seriously and chronically sick/disabled an independent risk factor for requiring a disability pension? Of course it is. Indeed it is the main risk factor, far outweighing any others.

And what is the alternative for people in such a situation?

What the PACE (and FINE and related) studies actually show is that these patients are on a pension for a very good reason, and that being in receipt of a pension is more a consequence of poor health, than a cause of it.
 

anciendaze

Senior Member
Messages
1,841
bounded random walks

After commenting on the 'PACE by random number generator' topic, I started rethinking some of my assumptions about the way a protocol might produce desired results. This goes outside of ordinary discussion of statistical bias.

There are two distinct issues: the protocol used and the way the protocol was implemented. Intervention to prevent adverse events leading to adverse outcomes for the patient is typically seen as a separate problem from experimental design. Patient welfare is presumed to take precedence over unbiased statistics. Concentrating on adverse outcomes they were unable to prevent ignores the effect of interventions to prevent those outcomes.

What has occurred to me is that GET was the treatment where the most concern over adverse events was concentrated. Do any sleuths who have crawled through this morass have any evidence there were more interventions with patients assigned to GET?

As far as I am concerned, the results of CBT speak for themselves. Patients expressed a subjective preference because it gave them someone to talk to, without significant change in performance. A control using barbers, hairstylists, cab drivers, (and possibly bartenders, though with extra controls,) should completely dispose of those results.
 

Dolphin

Senior Member
Messages
17,567
Note that this was the aim of the study:

"The aim of treatment was to change the behavioural and cognitive factors assumed to be responsible for perpetuation of the participants symptoms and disability."
YOu can come to a couple potential conclusions about this: (a)their assumption - that behavioral and cognitive factors are responsible for the symptoms and disability in this disorder is false - since their treatments had only moderate effects on CFS (and only on a subset of individuals) and/or b) they don't have particularly effective ways of addressing those factors.
Astute point there, Cort.
 

Esther12

Senior Member
Messages
13,774
I've still not got my head around their figures for what's 'normal'. Especially the SPF scores. I've collected a series of posts here that I want to re-read. I think we could make further progress here.

This is the article they use for their figures: http://jpubhealth.oxfordjournals.org/content/21/3/255.full.pdf+html I'm not quite sure which ones they use, and am giving up on trying to work it out tonight.

I also need to put more work into Dolphin's post @ 220 on the Chalder score.

Yes, extraordinary that they should make such a big change without even justifying it, or listing it under 'trial limitations' in the Discussion section. Apparently the 'per protocol' analyses are in the Web appendix but this isn't open access and I'm trying to get a copy of it.

So, to recap on how they moved the goal posts:

Primary outcomes were fatigue and physical function as measured by the Chalder Fatigue Scale and Sf-36 PF sub scale respectively - these haven't changed. What has changed is how 'improvement' is defined, and there are no prizes for guessing in which direction the change works.

Fatigue

Protocol says:

So, with a mean baseline fatigue score of approx 28 a 50% reduction would require a typical improvement of 14 points to count as a positive outcome.

The paper switches to likert (0,1,2,3) scoring - I have no problem with this - but now a reduction of 2 points is required for all participants to count as improvement.

To clarify: the protocol says a 14 point improvement in fatigue score is required, the pubished paper now says 2 will do.

The justification for these changes is

Seems to be based on ref 31 but I think they need to spell out the reason for this major change.

Physical Function

The protocol says

The average baseline SF-36 PF score is around 38 so typically an improvement of 19 points is required by the protocol.

The paper says an improvement of 8 points for each participant will do.

The scale only scores in 5 point intervals so that would be a 20 point improvement required, on average, according to the protocol, versus 10 points required in the paper.

Finally, the protocol was simply going to look at the proportion of improvers in each therapy group and use that as the measure of success (they expected 60% of the CBT group to improve this much!). The paper just looks at the average score in each group.

Of course, there may be a sound rationale for these changes (other than "shit, our trial is going to bomb, let's change how we define success") but if there is, they need to spell it out and explain exactly why they changed from thier carefully considered protocol. As I understand it, the CONSORT guidelines exist precisiely to stop this sort of Gerrymandering.

Not much time to do this but i see Dolphin has been doing some great digging - here are a few comments from me

Using GP attenders as a sample of a healthy population is patently ridiculous, but it's worse than that: PACE set the threshold as one Standard Deviation below the mean, which statistically is equal to the 15th percentile. This means they took 15% most fatigued patients attending GP surgeries and said any fatigue less than that counts as 'recovered'.

The Oxford criteria require that "c) The fatigue is severe and disabling". However, the PACE trial went futher than this, requiring that the disability (activity) element of a SF-36 PF subscale score of 65 or less. Thus the PACE trial recruitment criteria sets a disability threshold of 65 or less.

Using this threshold, 12% (251 ex 2,080 diagnosed as meeting the Oxford Criteria, from Figure 1) were excluded from the trial due to a PF score of 70 or more (let's assume it was 70 exactly). Yet these excluded patients had already been diagnosed by trial clinicians as suffering from 'severe & disabling fatigue'.

So we know that, according to PACE clinicians, a PF score of 70 counts as disabling fatigue.
Yet, when it comes to assessing recovery, a PF score of 60, ie 10 points lower , counts as 'normal'.

Black is white, the moon's a balloon and we're a bunch of malingerers.

Here's some UK normative data:

Short form 36 (SF36) health survey questionnaire: normative data for adults of working age. BMJ. 1993 May 29;306(6890):1437-40. Jenkinson C, Coulter A, Wright L. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1677870/pdf/bmj00022-0017.pdf

Agreed.

Here are some population norms for South Australia: http://www.health.sa.gov.au/PROS/portals/0/sa-pop-norm-SF-36.pdf . It's the PF (Physical Functioning) scale.

So if one looks at 35-44 year olds, the 25th percentile is 90 with 45% scoring 100/100.
And they want to tell us 60 is a normal score!
----
At http://www.sf-36.org/nbscalc/index.shtml , one can see the population norms for Sweden: (Mean, SD) Physical Function(ing) (87.9, 19.6), Role-Physical (83.2, 31.8), Bodily Pain (74.8, 26.1), General Health (75.8, 22.2) Vitality (68.8, 22.8), Social functioning (88.6, 20.3), Role-Emotional (85.7, 29.2), Mental Health (80.9, 18.9), Physical Composite Score (PCS) (50.0, 10) and Mental Composite Score (MCS) (50.0, 10).

Also the US, Norway and Canada (I had copied the Swedish ones before which is why I did them).

Note population norms for healthy people and/or working age adults would be more interesting - including people in their 80s (say) shouldn't count. I think I've seen them before for the SF-36.

Dolphin, I think I love you! :D

Sorry, getting carried away here because there are some gems in that paper that expose the flaws in the PACE choice of thresholds.

This study is based on completed questionnaires form 9,332 people of working age in central England.

The survey gives SF-36 PF scores for the whole sample, for people who reported a longstanding illness and, crucially, people who did not report a longstanding illness. This last group might be the best estimation of 'healthy'. Here are the mean SF-36 scores with SDs in brackets and the threshold that would result from using the PACE formula of "mean minus 1 SD":

'Healthy': 92.5 (13.4) = 79.1
'Chronically ill': 78.3 (23.2) = 55.1
'Population*': 89 (16) = 73
*oh damn, they don't seem to give this separately, this is my guesstimation from looking at the data they do give

Since the PF scale only scores in 5 point intervals (e.g. 60,65,70) these translate as PF threshold scores as:
Healthy = 80, population = 70 or 75. PACE used 60.

slightly more complex point
As a bonus, they provided SF-36 PF scores for people who had consulted a doctor in the 2 weeks prior to completing the questionnaire. This is a pretty close approximation of the 'GP attenders' used to establish norm data for the fatigue scale. The scores are
81.6 (23) = 58.6

This shows that not only are the GP attenders substantially less well than 'healthy' people (81.6 vs 89-ish), they also have a much bigger SD, which has the effect of lowering the threshold even further (60 vs 70 or 75). Obviously this is for PF scores not fatigue scores, but it does illustrate how GP attenders differ from the normal population​

From protocol paper:

"We will count a score of 75 (out of a maximum of 100) or more, or a 50% increase from baseline in SF-36 sub-scale score as a positive outcome. A score of 70 is about one standard deviation below the mean score (about 85, depending on the study) for the UK adult population [51,52].

50.Ridsdale L, Darbishire L, Seed PT: Is graded exercise better than cognitive behaviour therapy for fatigue? A UK randomised trial in primary care. Psychol Med 2004 , 34:37-49.

51.Jenkinson C, Coulter A, L W: Short form 36 (SF-36) Health Survey questionnaire: normative data from a large random sample of working age adults. BMJ 1993 , 306:1437-1440. "


Actually the one you took the data from is the one they referenced in their original paper (51). This makes it much stronger if it is quoted.
 

Esther12

Senior Member
Messages
13,774
They use this paper to justify changes to their primary measures: http://www.ncbi.nlm.nih.gov/pubmed/19455540

It prompted three replies and then a rejoinder. Maybe they cited a controversial statistics (!) paper in order to justify dubious maneauvers? Anyone here likely to have a good enough understanding of statistic to comment?

Pace also cited this paper on meaningful measures in medicine : http://www.mayoclinicproceedings.com/content/77/4/371.long

I wanted to paste it again here (Oceanblue posted it earlier) to remind me to go through it when I'm feeling less demented.
 

Doogle

Senior Member
Messages
200
Ambiguity on the London criteria form in the PACE Protocol

Now for the near-mythical 'London' M.E. criteria, as applied in the PACE trial. Again, starting with patients meeting the Oxford Criteria, there are 4 mandatory symptoms for inclusion and one exclusion:

  1. Exercise induced fatigue precipitated by trivial mental or physical exertion relative to the patient's previous exercise tolerance
  2. Impairment of short-term memory and loss of powers of concentration
  3. Fluctuations of symptoms ('the usual precipitation by mental/physical exercise should be recorded but is not necessary to meet criteria')
  4. Symptoms should have been present for at least 6 months and should be ongoing [nb unlike Oxford & CDC criteria that allow intermittent symptoms]
  5. There is no primary depressive illness and no anxiety disorder/neurosis
So that's the London Criteria - as defined by the PACE trial.

I have been perplexed as how the London criteria patients could essentially score the same on Fatigue and Physical function as the all participants and the Reeves criteria patients in Figure 2 of the PACE study paper if they weeded out the primary depressive illness and anxiety disorder/neurosis patients. Well, they may not have. At the top of the London criteria form it states: Criteria 1 to 4 must be met for a diagnosis of ME to be made.


London criteria.jpg
 

Angela Kennedy

Senior Member
Messages
1,026
Location
Essex, UK
I have been perplexed as how the London criteria patients could essentially score the same on Fatigue and Physical function as the all participants and the Reeves criteria patients in Figure 2 of the PACE study paper if they weeded out the primary depressive illness and anxiety disorder/neurosis patients. Well, they may not have. At the top of the London criteria form it states: Criteria 1 to 4 must be met for a diagnosis of ME to be made.


London criteria.jpg

Thanks for this Doogle.

Well-

1. We know there is some problem with inclusion of post-exertional problems (here it just says 'fatigue'- could mean emotional weariness to be honest!) in White's possibly fiddled with (I mean 'tweaked') version, according to Hooper's reply to Jameson. But who doesn't get tired after exertion? How do you define 'trivial'?

2. Impairment of short-term memory and loss of concentration- found in MDD?

'usually coupled with' means not necessary. Some of those symptoms are found in MDD, OR ANYBODY - emotional lability is very problematic as a concept for all sorts of reasons around trying to quantify what is actually a ubiquitous experience (changeable emotions!)

3. Fluctuation of symptoms. Again- in the world of the well or the Flu- who doesn't experience fluctuation in symptoms?
precipitated by exercise (not exertion this time? That can confuse matters)

4. six months- who cares how long to be honest? MDD can last that long of course, as can chronic overwork and burnout.

The level of linguistic ambiguity here is comparable to the 'CFS' diagnoses, which also makes it problematic. That's before we get into the other problems around 'London' and its history.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
Clinical significance

The Minimum Important Difference

Single-anchor methods generally aim to establish differences
in score on the target instrument that constitute
trivial, small but important, moderate, and large changes in
QOL. However, they generally put great emphasis on a
threshold that demarcates trivial from small but important
differences: the minimum important difference (MID).
One popular definition of the MID is the smallest difference
in score in the domain of interest which patients
perceive as beneficial and which would mandate, in the
absence of troublesome side effects and excessive cost, a
change in the patients (health care) management.31

Several factors have made the concept of MID useful.
First, it ties the magnitude of change to treatment decisions
in clinical practice. Second, the smallest important difference
one wishes to detect helps with the study design and
choice of sample size; this definition also links to a crucial
decision in trial design. Third, it emphasizes the primacy of
the patients perspective and implicitly links that perspective
to that of the physician
. Since discussions of the ethics
of clinical care increasingly emphasize shared decision
making, this link is useful. Finally, the concept appears
easily understood by clinicians and investigators (although
there is little experience with patients).
A limitation of this definition of MID is that it does not
explicitly address deterioration. One way to address this
problem would be to modify the definition as follows: the
MID is the smallest difference in score in the domain of
interest that patients perceive as important, either beneficial
or harmful, and which would lead the clinician to
consider a change in the patients management.
An alternative to labeling a change as being of minimum
importance is to think of it as subjectively significant.
32 This latter term emphasizes that one can have an
important deterioration and an important improvement. It
also makes explicit that the meaningfulness of change over
time is based entirely on the patients self-assessment of
the magnitude of change. Thus, the term subjectively significant
is congruent with the concept that QOL is a subjective
construct and that the prime assessor of QOL status
and change in that status is not an observer, but the patient.

Interpreting small differences in functional status: the Six Minute Walk test in chronic lung disease patients


http://ajrccm.atsjournals.org/cgi/c...903fd69e309251d2cf5cccb7&keytype2=tf_ipsecsha

Functional status measurements are often difficult to interpret because small differences may be statistically significant but not clinically significant. How much does the Six Minute Walk test (6MW) need to differ to signify a noticeable difference in walking ability for patients with chronic obstructive pulmonary disease (COPD)? We studied individuals with stable COPD (n = 112, mean age = 67 yr, mean FEV1 = 975 ml) and estimated the smallest difference in 6MW distances that was associated with a noticeable difference in patients' subjective comparison ratings of their walking ability. We found that the 6MW was significantly correlated with patients' ratings of their walking ability relative to other patients (r = 0.59, 95% confidence interval [CI]: 0.54 to 0.63). Distances needed to differ by 54 m for the average patient to stop rating themselves as "about the same" and start rating themselves as either "a little bit better" or "a little bit worse" (95% CI: 37 to 71 m). We suggest that differences in functional status can be statistically significant but below the threshold at which patients notice a difference in themselves relative to others; an awareness of the smallest difference in walking distance that is noticeable to patients may help clinicians interpret the effectiveness of symptomatic treatments for COPD.


This study suggests that while small differences in the 6 minute walk may be statistically significant , the minimum difference required before patients report any subjective improvement in their walking ability is 54 metres. By comparison, only the PACE GET treatment group reported a mean difference greater than this (barely at 67 metres) but this difference includes improvement likely to be due to time or usual SMT. Once SMT is discounted the difference falls below that likely to result in a subjective clinically useful improvement (45).
 

anciendaze

Senior Member
Messages
1,841
bounds and bounders

This is not really new, just a restatement of defects in rating scales found by oceanblue. You don't need any advanced mathematics at all to understand the potential problem. This is how I understand it.

This was a misunderstanding:
The protocol has always stated that improvement should mean a 50% reduction in fatigue. The original scale went from 0 to 33 points. The scale used at the end of the study had four bins (0,1,2,3). Patients with fatigue ratings above 30 are frequently bedbound or housebound, thus unlikely to participate. A group mean of 28 suggests a large number of patients entering the study were in the range 25-30 on the first scale.

If a patient entered the study with a score of 28 and this dropped to 14, this would be counted as improvement on either scale, because 28->bin 3 and 14->bin 1, a drop of two points on the new scale. However, if I understand the relation between old and new scales correctly, a score of 15->bin 1 and 26->bin 3. This means patients who, on the old scale, dropped from 26, 27, 28 to 15, would now go from bin 3 to bin 1 on the new scale. None of these would have been counted as improvement before; all would count as improvement after the change.

Preliminary estimates of the effect of the change indicate the study might lose counting improvements on patients with relatively low scores due to the change, but these would be less numerous. We can't actually demonstrate this because we do not have access to detailed data, only averages and group measures.
End of major blunder
Another problem shows up with the scale for physical activity. Originally, the study required scores of 30-60 for entry. This was expanded to 30-65. Quantization of this scale is limited to multiples of 5, so the number of bins is much smaller than it looks. At present it looks like a patient could enter with a score of 65, drop to 60, and be counted as improved, which certainly seems wrong. There was no need to exclude patients whose activity score dropped to 25 (the first score below 30) because they would be unlikely to show up for meetings. Interventions to protect patients from "adverse outcomes" would be likely in this range.

Both changes give the appearance of researchers shopping for metrics to improve results. Both are subject to natural bounds which limit ability to participate. These could have an impact on reported results which outsiders cannot check.

The major absurdity is that these are subjective ratings in a study where a major effort went into changing subjective values. Remembering or forgetting a single activity could change an activity score by 10, and we all know about problems with cognitive deficits. There are also plenty of labeled terms in the literature for bias introduced by either peers or authority figures. You could hardly exclude these from a study of the effectiveness of changing "illness beliefs".
 

Dolphin

Senior Member
Messages
17,567
Searchable full PACE Trial protocol (i.e. with questionnaires, etc.) now available

I previously re-circulated a copy of the full PACE Trial protocol with questionnaires.


Doogle has now sent me a searchable version of this which means one can search for words, etc. for which I am most grateful.


One should also be able to copy text.


I've put it up in two places as only 100 downloads are allowed for each:

http://bit.ly/h3tdO8 i.e.
https://www.yousendit.com/download/T2pGTXRZQTZPSHhjR0E9PQ

and

http://bit.ly/dLQjKy i.e.
http://www.yousendit.com/download/T2pINnFFdVVubVh2Wmc9PQ

Eventually (in just under a week) the facility to download the file will expire.


To explain again the reason for this:

There is a protocol in the public domain:
White PD, Sharpe MC, Chalder T, DeCesare JC, Walwyn R; PACE trial group.
Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise, as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with the chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurol. 2007 Mar 8;7:6.
http://www.biomedcentral.com/1471-2377/7/6


However, it refers to lots of questionnaires people may not be familiar with and might have a hard time getting (for example, Work and Social Adjustment Scale which I don't believe I ever saw before despite it being used). These measures could also be useful when trying to read other studies.


Best of luck to anybody trying to write letters to the Lancet. Remember the rules are letters of 250 words or less and no more than 5 references (which means 4 if one quotes the Lancet article, and 3 if one also quotes the protocol in BMC neurology!). Submissions are made through:
http://ees.elsevier.com/thelancet/
 

Dolphin

Senior Member
Messages
17,567
A relative newbie on the PACE Trials, etc (not a detailed analysis)

(May be re-posted)

Here's a relative newbie talking about the PACE Trials, etc. They gave me permission to repost what they said. Not news to a lot/most of you of course. She's referring to a letter I sent in to a newspaper.

Thank you so much, once again, for submitting yet another letter on this important and contraversial subject.

I saw the article (in the Indo?) and was dismayed at the suggestions that Graded Exercise works and, as you said, that people with ME/CFS basically
have "attitude and behavioural issues". I found Graded Exercise to be very
damaging and the people who believed it to be helpful, to be very sneering and cynical about my views on the subject - eventhough I was/am the sufferer, not them!

Also, I felt my views were not only more informed as a sufferer, but also, that as a formerly very fit, very healthy, very active sportsperson all of my life (until I got Epstein Barr and subsequently CFS), I am certainly not averse to exercise - I would love to be more active, all the time. I absolutely hate having to "measure out my energy" every day, however, this is what it takes to improve the condition and this is what has been slowly, slowly working for me. I sincerely hope people will not go mad on Graded Exercise now, but adopt the very real and useful approach of managing and measuring your energy and activities, and making new choices. Hopefully this will work for all of us, however slowly.

With best regards,
 

Mark

Senior Member
Messages
5,238
Location
Sofa, UK
They use this paper to justify changes to their primary measures: http://www.ncbi.nlm.nih.gov/pubmed/19455540

It prompted three replies and then a rejoinder. Maybe they cited a controversial statistics (!) paper in order to justify dubious maneauvers? Anyone here likely to have a good enough understanding of statistic to comment?

Pace also cited this paper on meaningful measures in medicine : http://www.mayoclinicproceedings.com/content/77/4/371.long

I wanted to paste it again here (Oceanblue posted it earlier) to remind me to go through it when I'm feeling less demented.

Here's the abstract from the first link above - part of the justification for moving the goalposts:

Measurement in clinical trials: a neglected issue for statisticians?

Abstract

Biostatisticians have frequently uncritically accepted the measurements provided by their medical colleagues engaged in clinical research. Such measures often involve considerable loss of information. Particularly, unfortunate is the widespread use of the so-called 'responder analysis', which may involve not only a loss of information through dichotomization, but alsoextravagant and unjustified causal inference regarding individual treatment effects at the patient level, and, increasingly, the use of the so-called number needed to treat scale of measurement. Other problems involve inefficient use of baseline measurements, the use of covariates measured after the start of treatment, the interpretation of titrations and composite response measures. Many of these bad practices are becoming enshrined in the regulatory guidance to the pharmaceutical industry. We consider the losses involved in inappropriate measures and suggest that statisticians should pay more attention to this aspect of their work.
It does seem very technical and isn't obvious how it applies to the changing of the measures they used, but part of what I read there is an argument against using "inappropriate measures"...and maybe I'm reading too much into it but here's an argument I could paraphrase which may or may not be relevant...

So...if you do a study to examine some major effect, and get the right numbers of people to examine that effect - say, 640 people in 4 groups - then you've got your overall, large-scale study design, designed in advance to have the right statistical properties to give meaningful results (allegedly).

Then: suppose you collect more detailed data as well, along the way: lots of other dimensions of information. That more detailed data wasn't the core of the original experiment, so its statistical properties aren't part of the design. Then, after the study, you find correlations between variables that you weren't explicitly looking for. Perhaps within one of the 4 groups of 160 you find a further breakdown that suggests - say - that the people who responded well to the CBT were all also people who were on antidepressants at the start of the therapy. I guess such analysis would be a 'responder analysis'?

But analysing that correlation between the antidepressants and the success of the CBT wasn't part of the original design, and you didn't therefore design the numbers and everything else in order to make those results statistically valid. Then, if you find something that looks significant - like only the people on antidepressants got better from CBT - you might falsely think that was significant, due to misunderstanding of randomness and statistics.

Therefore, the "inappropriate measures" shouldn't be taken in the first place, otherwise you'll end up drawing unjustified conclusions.

That's a very rough paraphrase of the sense I get from the abstract...and that this argument has been used to remove measures that would be 'inappropriate'. And of course it's just another convenient coincidence that the hypotheses we would form to explain these results are precisely the details of responder analysis that must not be measured...

How that relates to stopping using actometers etc in practice I'm not sure...that's perhaps a separate issue...but it rather looks like this type of argument was used to justify not measuring all the things they originally were going to measure...

It just looks like a very complex argument - one that nobody was able to dig deep enough to refute in the short time given for a response - used to justify measuring precisely those things they don't want to measure in case they show results they don't like...and I'm really not sure how it justifies changing your study design halfway through...but it makes me wonder which further detailed findings that we might predict may not be revealed in future based on these sort of justifications.

In other words, it just looks like another swindle, and a particularly hard one to unpick...and probably only relevant now in relation to proving a manipulation which amounts to either deliberate fraud or (more kindly) the influence of the subconscious biases of the researchers over the evolving study design.