Documents relating to a complaint about a Lancet editorial in 2011 on the PACE Trial & CFS

Dolphin · Sep 20, 2017

I have been passed these documents, and been told it's fine to pass them on to others too or make them public.

Dolphin · Sep 20, 2017

Was the criterion [for recovery] a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person's score? Yes, as detailed above. The Lancet article2 contains the following definition (under "statistical analysis"): "we compared the proportions of participants who had scores of both primary outcomes within the normal range at 52 weeks. This range was defined as less than the mean plus 1 SD scores of adult attendees to UK general practice of 14-2 (+4-6) for fatigue (score of 18 or less) and equal to or above the mean minus 1 SD scores of the UK working age population of 84 (-24) for physical function (score of 60 or more),32'33"

This is rubbish. The SF-36 physical function data did not relate to healthy people.

And indeed the PACE Trial authors were forced to admit it related to the full population, not just those of working age

We determined the normal range by use of the conventional mean plus or minus 1 SD from what we regarded as the most relevant general population data. For physical function, this was a demographically representative sample (in our paper we stated that this was a UK working-age population, whereas more accurately this should have been an English adult population).3

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60651-X/fulltext

The fatigue data didn't also relates just to people who were healthy either:

General population participants were registered with five group general practices in the southeast of England chosen to ensure a mix of social class and urban vs. rural distribution. As family doctors in the UK are free of charge, all residents have the right to register. Recruiting representative community samples from general practitioner's lists is a widely used method. Thirty-one thousand, six hundred and fifty-one men and women aged 18–45 registered with the five general practices were asked to take part in the original study (Stage 1) (see Ref. [3]). For this current study, only completed data from those who went to see their general practitioner the following year with either a viral illness or a complaint other than a viral illness were used in this study. More detailed description of the sample and recruitment procedures is reported elsewhere (Stage 2) (see Refs. [18,19]). The 1615 patients used for this study are those who completed all the items of the fatigue scale. Aside from knowing that participants presented to the GP with either an infection or another complaint, no further information regarding medical condition, illness, and general practitioner attendance was sought.

Dolphin · Sep 20, 2017

Editorial in question:
http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60172-4/fulltext

Dolphin · Sep 20, 2017

I put 2 Astrid James of Lancet to PCC through OCR:

THE LANCET

RICHARD HORTON EDITOR ASTRID JAMES DEPUTY EDITOR

Elizabeth Cobbe

Press Complaints Commission

Halton House 20/23 Holborn London EC1N 2JD

February 12th 2013

Dear Ms Cobbe

Thank you very much for your email of Feb 5. We are disappointed that the PCC has chosen to continue to investigate the Countess of Mar's complaint despite the fact that she has breached the PCC's guidance by publishing excerpts of my response without my consent, and while the complaint remains under investigation. We ask the PCC to reject the complaint on this basis, or, at the very least, to take this breach of its guidance into account in its decision making.

We stand by our confidence in the accuracy of Drs Bleijenberg & Knoop's Comment, as do the authors themselves, and are happy to provide further information and clarification.

We are very concerned, however, that the situation is being oversimplified. Clinical research is complex in execution and interpretation, and we urge the PCC to consider all aspects of the case. The complaint focuses narrowly on alleged discrepancies between the PACE article in The Lancet and Bleijenberg & Knoop's Comment; in fact the trial protocol and a recent publication from PACE need to be taken into account to achieve a full understanding of the case. We note that the complaint is about the Comment, yet confuses the matter by also criticising the Lancet paper written by Prof White and co-workers.

PACE was a clinical trial carried out several years ago. As is customary, planning of the trial involved preparation of a very detailed protocol setting out details of the patients to be included in the trial and the treatments that they would receive, as well as the assessments that would be made of the patients' health; this document was published in 2007 in BMC Neurology.1 Clinical trials must follow the protocol on which they are based, which the PACE trial did, with the exception of changes incorporated into the detailed statistical analysis plan that were justified scientifically. Clinical trials submitted to The Lancet must be submitted with their protocols, as stated in our Information for Authors. The PACE protocol, which links to The Lancet 2011 paper on The Lancet's website via the Methods section (so the protocol is directly accessible via www.thelancet.com and should be considered to be part of the Lancet paper), contains the following definition of recovery:

"Under Secondary Outcome Measures: 4. "Recovery" will be defined by meeting all four of the following criteria: (i) a Chalder Fatigue Questionnaire score of 3 or less [27], (ii) SF-36 physical function score of 85 or above [47,48], (iii) a CGI score of 1 [45], and (iv) the participant no longer meets Oxford criteria for CFS [2], CDC

criteria for CFS [1] or the London criteria for ME [40]." (The reference numbers refer to those cited in the full protocol document.)

Clearly, defining "recovery" is complex, and the PACE research article in The Lancet reported the main participant-rated primary outcomes from the trial (the Chalder fatigue questionnaire and the SF-36 physical function score).2 These two primary outcome measures are valid and reliable and have been used in previous trials. On page 831 in the Results section of the PACE research article in The Lancet, the authors report "25 (16%) of 153 participants in the APT [adaptive pacing therapy] group were within normal ranges for both primary outcomes at 52 weeks, compared with 44 (30%) of 148 participants for CBT [cognitive behaviour therapy], 43 (28%) of 154 participants for GET [graded exercise therapy], and 22 (15%) of 152 participants for SMC [specialist medical care]". The PACE article was accompanied by a Comment piece by Bleijenberg & Knoop.3The Comment authors state, on page 787, that "PACE used a strict definition for recovery: a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person's score. In accordance with this criterion, the recovery rate of cognitive behavioural and graded exercise therapy was about 30%—although not very high, the rate is significantly higher than that with both other interventions".

Bleijenberg & Knoop have told The Lancet that "one way of defining recovery is to say that a patient is no longer fatigued and disabled, two key elements of the CDC definition of CFS. One could further operationalise recovery as scoring within normal range on questionnaires assessing both aspects. Using this criterion, the recovery rate of CBT and GET in the PACE trial was about 30%. We think that this indicates that recovery following behavioural interventions of CFS is possible".

It is also worth pointing out that the PACE team have now reported their full data on recovery in a peer-reviewed paper, and it seems that these data appeared in the public domain after the Countess of Mar's complaint.4 It is important to note that there is nothing suspicious or unusual in this process. Reporting on the findings of clinical trials often results in several papers being published in different journals over a period of several years—not only does detailed statistical analysis of the data take time but also peer review and publication can proceed slowly.

With this background, we believe that definitive answers can now be obtained to the issues raised in the complaint:

Did PACE use a strict criterion for recovery? Yes, it did; as detailed above this definition is included in the published trial protocol, and results based on this criterion have now been published in the Lancet paper2 and in the 2013 PACE article.4

Was the criterion [for recovery] a score on both fatigue and physical function within the range of the mean plus (or minus) one standard deviation of a healthy person's score? Yes, as detailed above. The Lancet article2 contains the following definition (under "statistical analysis"): "we compared the proportions of participants who had scores of both primary outcomes within the normal range at 52 weeks. This range was defined as less than the mean plus 1 SD scores of adult attendees to UK general practice of 14-2 (+4-6) for fatigue (score of 18 or less) and equal to or above the mean minus 1 SD scores of the UK working age population of

84 (-24) for physical function (score of 60 or more),32'33"

Was the recovery rate in patients treated with cognitive behavioural therapy or graded exercise therapy about 30%? These are reasonable estimates, as described above.

Does the PACE trial show that recovery from chronic fatigue syndrome is possible? As noted by Bleijenberg & Knoop and others, this depends on the definition of "recovery" used. But based on the results presented in the 2011 Lancet paper and the 2013 PACE article, we can conclude that a proportion of patients do recover. We should bear in mind, of course, that the mechanistic basis of chronic fatigue syndrome is poorly understood, and we cannot expect the high rates of recovery achieved with drugs such as antibiotics where pathogenic mechanisms are well characterised.

In sum, we have provided complete answers to the complaint from the Countess of Mar.

We ask the Commission to dismiss the complaint.

As far as the wider context of research on chronic fatigue syndrome is concerned, there has been a recent debate on PACE in the House of Lords which the Commission may find interesting.5 Other than the Countess of Mar, who tabled the debate, the consensus appears to have been strongly supportive of the PACE trial, which Lord Winston is quoted as having described as "an example of really excellent research".

Please note that this letter is not intended for publication, and we trust that the complaint will be judged invalid if this letter, or any part of it, should appear in the public domain.

Yours sincerely

Astrid James Deputy Editor The Lancet

1 White PD, Sharpe MC, Chalder T, DeCesare JC, Walwyn R, and the PACE

trial group. Protocol for the PACE trial: a randomised controlled trial of adaptive pacing, cognitive behaviour therapy, and graded exercise as supplements to standardised specialist medical care versus standardised specialist medical care alone for patients with chronic fatigue syndrome/myalgic encephalomyelitis or encephalopathy. BMC Neurology 2007; 7:6.

2 White PD, Goldsmith KA, Johnson AL, et a I, on behalf of the PACE trial management group. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. Lancet 2011; 377:823-36.

3 Bleijenberg G, Knoop H. Chronic fatigue syndrome: where to PACE from here? Lancet 2011; 377: 786-88.

4 White PD, Goldsmith K, Johnson AL, ChalderT, Sharpe M, and PACE Trial Management Group. Recovery from chronic fatigue syndrome after treatments given in the PACE trial. Psychol Med 2013; 1-9. D0l:10.1017/S0033291713000020.

5

http://www.publications.parliament.uk/Da/ld201213/ldhansrd/text/130206-gc0001.htm#130206114000195 (accessed Feb 12, 2013).

snowathlete · Sep 20, 2017

Dolphin said:
This is rubbish. The SF-36 physical function data did not relate to healthy people.

And indeed the PACE Trial were forced others were forced to admit it related to the full population, not just those of working age

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60651-X/fulltext

The fatigue data didn't also relates just two people who were healthy:

Glad someone has been looking into this.

Wow, that's really bad, to provide answers that are so obviously not true to the PCC. It's one of the most blatantly untrue things I've seen in relation to this. Problem is, with questions asked behind the scenes like this, there is no one to correct it.

Esther12 · Sep 29, 2017

I left this page open, but then got distracted by SMILE.

snowathlete said:
Glad someone has been looking into this.

Wow, that's really bad, to provide answers that are so obviously not true to the PCC. It's one of the most blatantly untrue things I've seen in relation to this. Problem is, with questions asked behind the scenes like this, there is no one to correct it.

This does look closer to dishonesty than mere incompetence.

It's amazing to me how badly the Lancet has behaved with this.... and been able to get away with it for so long!

Snow Leopard · Sep 30, 2017

These people need to learn about histograms and look at the SF-36 data. They wouldn't be making such incorrect claims if they understood statistics.

Barry53 · Sep 30, 2017

Please note that this letter is not intended for publication, and we trust that the complaint will be judged invalid if this letter, or any part of it, should appear in the public domain.

A little coy about what they said in the letter it would seem.

Barry53 · Sep 30, 2017

The Lancet reported the main participant-rated primary outcomes from the trial (the Chalder fatigue questionnaire and the SF-36 physical function score).2 These two primary outcome measures are valid and reliable and have been used in previous trials.

Deeply suspect of course.Validity and reliability are heavily dependant on the context in which the outcomes measures are applied; a decision had to be made of whether they were valid/reliable in the context of PACE. That very decision making process is itself, highly subjective unless very carefully managed and controlled. Given the trial was unavoidably unblinded, and the trial run by people totally wedded to the outcomes they aimed (and supposedly did) "prove", I think their trial execution was itself heavily loaded with considerable subjectivity from the start, let alone their subjective outcome measures.

Did the peer reviewing really not question the issue that if a trial is fully unblinded (as some unavoidably are), then allowing it to be run by scientists with a 100% lock-in to the outcomes they desperately aspire to prove, is just so dangerous and unethical.

HowToEscape? · Sep 30, 2017

What exactly is on that SF 36 list? Does it test for PEM? If the patients do not have significantly reduced or eliminated next day PEM, then they have no meaningful objective improvement.

Esther12 · Sep 30, 2017

HowToEscape? said:
What exactly is on that SF 36 list? Does it test for PEM? If the patients do not have significantly reduced or eliminated next day PEM, then they have no meaningful objective improvement.

This video, along with some explanation in the details, should be useful:

user9876 · Sep 30, 2017

Barry53 said:
A little coy about what they said in the letter it would seem.

They should be concerned about it. They say how recovery is defined in the protocol and then say that has been published in the PACE recovery paper which is was not. They then claim that this shows that people can recover hence the editorial was correct.

This is of course misleading because the recovery paper doesn't cover the protocol and data was available for recovery to be calculated as defined in the protocol then there was no significant recovery.

anciendaze · Oct 1, 2017

Still wondering how they got that mean - one standard deviation when the mode was 95 and there was a hard cut-off at 100. A casual glance at the histogram suggests there might be many scores above 100, if you were to allow them. You are using mean and variance of a one-sided distribution as if it were not merely two-sided and symmetrical, but actually normal (Gaussian).

"If you call a tail a leg, how many legs does a dog have?" "Four. Calling a tail a leg doesn't make it one."

Second, we know for a fact that many of the people in the reference population they were using had unambiguous illnesses, some as serious as COPD or heart failure. Was there any attempt to separate these from the healthy reference group?

What they demonstrated was that if you ignore the difference between people who are sick and well, you can decide that nobody can be seriously ill without a note from their doctor. This quality of reasoning, which commonly takes place in pubs near closing time, does not require or deserve research funding.

Esther12 · Oct 1, 2017

anciendaze said:
Second, we know for a fact that many of the people in the reference population they were using had unambiguous illnesses, some as serious as COPD or heart failure. Was there any attempt to separate these from the healthy reference group?

No, to get a mean-1 sd of 60 they included all sick and elderly members of the population.

anciendaze · Oct 1, 2017

@Esther12
Of course, it might be 60 or it might be 65, depending on what the authors needed at the moment. By changing this they were tacitly arguing that a change of 5 points was not significant. Strangely, this did not apply to changes which might indicate recovery. This is an internal inconsistency in the authors reasoning.

I've made more complicated arguments about that distribution. If it is an example of Lévy distribution, a stable distribution generally other than normal, the theoretical distribution has no finite variance. This is not a good basis for any of the techniques used in analysis of variance. In practice this means values for variance and standard deviation are entirely determined by the bounds on the sampling. If this is true then manipulating bounds can give you whatever answer you want.

Now, there might be a normal distribution of healthy people, although this group made no effort to find it or separate it from the distribution of sick people. They effectively precluded any such attempt by using a measure with a hard cut-off at 100. I suspect the scale was designed so that healthy people scored at least close to 100. Some people, being healthier than average, would be expected to score above the mean, but we know nothing about them.

The problem comes in modeling that "fat tail" of sick people. I've tried several combinations of different distributions to model the complete composite distribution. One thing I learned from the exercise is that a wide range of standard deviations might be used in modeling that tail. This is pragmatic evidence that little weight should be given to statistical inference based on analysis of variance in this situation.

Lévy distributions turn up in many cases where ordinary statistical measures are likely to underestimate the chance of catastrophic failures. Everyone should now be aware that banking, insurance and stock-trading on the assumption that negative excursions are as rare as predicted by normal distributions is dangerous. Similar mistakes happen when you talk about "hundred-year floods", as we have seen this year. Assuming people are still healthy when their performance has been reduced to a fraction of that of a healthy person is equally unrealistic.

Every doctor regularly sees people affected by a concatenation of biological insults, not all of which can be identified, and some of them die. Death is the cut-off that prevents people from being infinitely ill, as a mathematical model might allow. Assuming multiple problems are independent is a gross distortion of the truth, yet that is what assuming a normal distribution implies.

Doesn't anyone out there have a minimal basic understanding of the mathematics of sickness? It should be at least as sophisticated as the mathematics of reliability of complicated machines, and that predicts that catastrophic failures are much more likely than similarly naive analyses imply. People didn't believe me when I said this about the Space Shuttles, which are now museum exhibits. I was neither the first nor the last to say this, but honest realism conflicted with the need to lobby for funding.

Mithriel · Oct 2, 2017

I may be missing the point, if so, sorry. The SF36 is just a list of questions about how people feel and what they can do. It is a man made scale that was devised so that doctors and patients could have a definite number to see if a treatment was helping or if a disease was progressing in things like cancer or Parkinson's disease as it is very hard to remember or quantify how you feel now as opposed to how you felt 6 months ago.

The questions were worked out so that a healthy person would score 95 - 100 (most people can go out when they want to for instance) and split into domains so that a doctor could see if a heart patient, say was doing physically better while becoming depressed. It was never a normal distribution like height, say, which has a peak and then outliers.

In the PACE trial it was simply used wrongly and strangled to give them the result they wanted in a completely blatant way. Michael Sharpe said that if they hadn't changed the outcome most healthy people would have fell within it which was either a lie or ignorance.

At the time, some of the paper, or other things they wrote said they used 2 standard deviations.

Documents relating to a complaint about a Lancet editorial in 2011 on the PACE Trial & CFS

Dolphin

Senior Member

Attachments

Dolphin

Senior Member

Dolphin

Senior Member

Dolphin

Senior Member

snowathlete

Senior Member

Esther12

Senior Member

Snow Leopard

Hibernating

Barry53

Senior Member

Barry53

Senior Member

HowToEscape?

Senior Member

Esther12

Senior Member

user9876

Senior Member

anciendaze

Senior Member

Esther12

Senior Member

anciendaze

Senior Member

Mithriel

Senior Member