PACE Trial and PACE Trial Protocol

Dolphin · Jul 13, 2012

user9876 said:
I don't think you need to subscribe to a journal to submit a paper to one. Not sure about the medical world I'm a computer scientist and I've published in journals that I've not subscribed to.
How ever if the journal is not open it makes it harder to make the work publically available.

I

I agree with you that you don't need to subscribe to a journal to submit a paper. My point was most journals tend to fund themselves in either two ways: subscriptions (which witll generally mean they are not open access or people wouldn't pay the subscription fee) or open access publications which pay for themselves from submission fees.

Graham · Jul 13, 2012

Dolphin said:
if one had two weight loss trials: one looked a intervention 1 and saw whether it produced a clinical important difference; for intervention 2, participants were restricted to be between a certain weight (e.g. 11 stone-13 stone = 154lbs-182lbs = 69.85kg-82.55kg). It will be much easier for intervention 2 to reach a CUD as the SD of the baseline scores will be quite small. The PACE Trial is like what happened in the trial of intervention 2.

What the reference the PACE Trial authors used suggested be done is using the standard deviation for the whole population.

There is another factor which hasn't been considered though. Your weight analogy is fine, but of course people's weights are fairly static. But what if each individual's weight varied greatly (just like our energy levels can)? It isn't a real possibility, so suppose we look at junk mail and say that a typical person gets between 0 and 30 items of junk mail per day (it's only an example, so don't get picky!!). Now we ask everyone to record how many items of junk mail that they got the day before, and only take those who got between 25 and 30 as being the ones who are seriously affected. Now we train them the anti-junk mantra, which they have to recite a hundred times before switching on their computer ("downwithpacedownwithpacedownwithpace...") and ask them to record how many junk items they got. Which would be the appropriate standard deviation to use, their individual variation, or the variation between the individuals in the constrained group? It's quite obvious that the individual variation is important, but as far as I can tell, no-one considers this. It's as though they think that the variations between individuals is sufficient.

I'm working on a different analogy for the article for peer-review, but I had no idea that people had to pay to have these published. Am I being utterly naive? Well there's no chance of me doing that. At the moment I'm playing around with an idea of focusing on Likert versus Bimodal scoring, and using ME as a case point, rather than focusing on ME in particular. The idea was to broaden the audience. Now I've got my doubts as to whether to continue. I feel like a fish out of water here.

Dolphin · Jul 13, 2012

Graham said:
I'm working on a different analogy for the article for peer-review, but I had no idea that people had to pay to have these published. Am I being utterly naive?

Hi Graham, just to be clear: for lots, probably most, journals, one doesn't have to pay to submit. It's just with that most of these aren't open access, only the abstract will be available for free. But there can be some journals that are both open access and free to submit. I gave one example. There are thousands of journals out there.

user9876 · Jul 14, 2012

Graham said:
There is another factor which hasn't been considered though. Your weight analogy is fine, but of course people's weights are fairly static. But what if each individual's weight varied greatly (just like our energy levels can)? It isn't a real possibility, so suppose we look at junk mail and say that a typical person gets between 0 and 30 items of junk mail per day (it's only an example, so don't get picky!!). Now we ask everyone to record how many items of junk mail that they got the day before, and only take those who got between 25 and 30 as being the ones who are seriously affected. Now we train them the anti-junk mantra, which they have to recite a hundred times before switching on their computer ("downwithpacedownwithpacedownwithpace...") and ask them to record how many junk items they got. Which would be the appropriate standard deviation to use, their individual variation, or the variation between the individuals in the constrained group? It's quite obvious that the individual variation is important, but as far as I can tell, no-one considers this. It's as though they think that the variations between individuals is sufficient.

To develop your analogy further. Lets ask people how many pieces of junk mail they have without counting them. That is perception acts as a function on a countable thing. That is basically what we are doing when we run a survey. Now tell people that they have less junk mail with the intervention and see how their perceptions change (even with a null intervention). I remember reading a paper by a russian economist around this subject, I will try to dig it out.

Another interesting thing is when you send your survey some people may have so much junk mail that they miss the survey. (or with ME are not feeling well enough to fill out the form). So people either fill in the form on a different day when they notice it or to keep up the sample size we fill in the missing value with the average.

I'm working on a different analogy for the article for peer-review, but I had no idea that people had to pay to have these published. Am I being utterly naive? Well there's no chance of me doing that. At the moment I'm playing around with an idea of focusing on Likert versus Bimodal scoring, and using ME as a case point, rather than focusing on ME in particular. The idea was to broaden the audience. Now I've got my doubts as to whether to continue. I feel like a fish out of water here.

I think it's a good idea to look at likert vs bimodal scoring.

As well as the statistics around the scoring methods I beleive there is some psycology theory around having likert items that are symetrical (as many bad as good choices). In the past I've collaborated with a cognitive scientist who pressed us to use a likert type of scale so next time I meet her I will ask about the underlying theory.

I have some idea's around playing with surveys and turning the results of a set of questions representing a combination of topics into a single measure and showing the statistical properties of that measure under different conditions. I've just started to write some code to explore different scenarios.

The only time I've paid to publish articles is when I've gone over the page limit for conference papers. Paying (not my money) was easier than shortening the paper.

Snow Leopard · Jul 14, 2012

biophile said:
I would like to know what really drove the protocol changes too.

I am of the opinion that they always planned to do a bait-and-switch. Deviation from the protocol is the norm, not the exception in science after all. The only difference here is the cover-up.

Given that they have never published any data on recovery (probably because there wasn't any difference between the groups), the idea that the data was complete is laughable.

user9876 · Jul 14, 2012

Snow Leopard said:
I am of the opinion that they always planned to do a bait-and-switch. Deviation from the protocol is the norm, not the exception in science after all. The only difference here is the cover-up.

Given that they have never published any data on recovery (probably because there wasn't any difference between the groups), the idea that the data was complete is laughable.

Is'nt this the point to make to the information commisioner that they are not being entirely honest around data release. In one request they say they won't release data because the intent to publish it one day and in another they say all data has been published.

I don't think they had planned to change the protocols - I think they believe they are right hence they would get results to prove their views.

Snow Leopard · Jul 14, 2012

user9876 said:
I don't think they had planned to change the protocols - I think they believe they are right hence they would get results to prove their views.

My opinion is: they've been doing this a long time, they knew what results they were going to get.

I kind of think the overstating/withholding evidence is itself a social phenomena - and they probably have to to this if they want to hold on to their slice of the pie (they justify this by thinking that if it wasn't for them, there would be nothing at all for CFS). The disappointing part is they expect concessions from those who favour the importance biomedical factors, but provide no concessions in return.

Esther12 · Jul 14, 2012

I think they expected to get genuinely good results too.

I'm less sure of this having seen the fallout from PACE, and the way's it's been spun. It does seem like a crazy waste of the researcher lives if they'd intended to manipulate their results in this way... but then, it doesn't seem like honesty is a big priority for them in their lives.

user9876 · Jul 14, 2012

Esther12 said:
I think they expected to get genuinely good results too.

I'm less sure of this having seen the fallout from PACE, and the way's it's been spun. It does seem like a crazy waste of the researcher lives if they'd intended to manipulate their results in this way... but then, it doesn't seem like honesty is a big priority for them in their lives.

I think researchers can get so involved in particular methods and solutions that they just believe uncritically in them what ever the outcome. If they stopped to think they would realise that their techniques don't work but they often don't think - or don't want their beliefs challenged (I believe the term is cognitive dissonance).

Esther12 · Jul 14, 2012

Some of the post-PACE spin would have required intentional deceit imo. They could still tell themselves that they're doing it to help patients ("We need to get funding for these treatments which help some a bit"), but it's increasingly difficult to believe that, at least some of them, have not consciously decided to screw patients over in order to protect their own careers and reputations. If they'd made some sort of apology for the way the efficacy of CBT/GET were exaggerated by the manner in which they'd presented their results, I could have gone back to seeing them as well meaning but misguided - instead we had Esther Crawley going even further, and claiming PACE showed a 30-40% recovery rate! Maybe some of those involved are being misled by others, but it's difficult to see how these sort of falsehoods could occur if they were all acting on good intentions and a commitment to honest science.

Unfortunately, believing this would serve to instantly discredit me before many researchers: "You're implying dishonesty in science?! What a militant crazy conspiracy theorist you must be - these CFS patients are a nightmare, I feel so sorry for the researchers who have to put up with their attacks."

Graham · Jul 14, 2012

Let's see now ... James Murdoch and the media, Fred Goodwin and RBS, Bob Diamond and Barclays, and now Nick Buckles and G4S – all senior management who had "no idea" of what was really happening and were unable to remember certain conversations. And of course we have PACE.

Personally I believe they all acted with purity in their hearts, and were convinced that what they were doing was utterly right and in the best interests of members of the public. Sorry, can't write more, Peter Pan has just popped in and we're off to Never Never Land.

Thanks, user 9876, for your comments. I'd be interested in what you find out about Likert scoring. My memory of it (from way back) was that it was set up to produce a more balanced distribution of scores. The aim was, say, to allow a score of 0 to 10, knowing that virtually nobody would pick 0 or 10, so that the distribution of answers was not curtailed and could move either way. Nothing like the Fatigue Scale then!

user9876 · Jul 14, 2012

Graham said:
.

Thanks, user 9876, for your comments. I'd be interested in what you find out about Likert scoring. My memory of it (from way back) was that it was set up to produce a more balanced distribution of scores. The aim was, say, to allow a score of 0 to 10, knowing that virtually nobody would pick 0 or 10, so that the distribution of answers was not curtailed and could move either way. Nothing like the Fatigue Scale then!

Yes they talk about two things. The first is a likert item which should be symetrical around a don't care value so it may be 1-7 (4) as a neutral value or 1-5 or a 0-10 as you say. It is considered important that the scale given covers the extremes of possible values. One of the big errors in the fatigue scale (and sf36) is that they do not do this. This then creates debate over the linerarity of the scale which will depend on things like the language used and how people map their concepts onto the scale. If the belief is that this is more or less linear then means and std are thought ok otherwise best to use median and percentiles. I suspect there is a whole literature around how to ask questions, I remember a psyhcolinguist claiming he could construct a survey to get whatever answer he wanted by playing around with the questions!

The other concept is that of the likert scale. That is summing together all the items (i.e. question responses). Here there seems to be an important thing that the likert scale is measuring one attitude or thing (I would see this as one underlying random variable). Hence all the questions reflect someones attitude on that thing + some error. I believe that you should check that there is a high degree or correlation between answers to all the questions. The reason for asking multiple questions then seems to me is about the difficulty of getting people to reflect on their attitude directly with the hope that the errors will even out over a set of questions.

My belief is that the fatigue scale doesn't do this as it measures between 2 and 4 different things, the two big ones being mental and physical fatigue. Now they do quote cronbach alpha which is some kind of measure of correlation between questions but I've also read this can be misleading where one set of questions dominate another. One question I have is how tightly correlated are mental and physical fatigue can one go up and another go down but do they normally change together? This seems to be an important question when looking at the fatigue scale.

I've not managed to find a really good reference though I've just been digging around different web pages, stats forums and web pages.

Bob · Jul 14, 2012

Esther12 said:
Maybe some of those involved are being misled by others...

This could possibly be the case, as there are some very high status scientists involved in this study, directly or indirectly. If I was a co-author (I wouldn't be), I would be very wary of getting into a protracted professional disagreement with someone like Wessely.

Interestingly, most of the authors have been very exact with their wording. Always choosing carefully chosen ambiguous phrasing that can be interpreted more than one way, so that it can't get them into trouble, even if it is misleading. For example, in the paper, they repeatedly assert that CBT had a 'moderate' effect size (without qualification). It did indeed have a moderate effect size, but only for one of the primary outcomes. So they could equally have said that CBT had a 'small' effect size. They never mention CBT's failure to meet the threshold for a CUD. Interestingly, such obfuscation goes against the MRC's guidelines.

There's only one or two possible occasions where I've caught them involved in outright misleading phrasing.
One occasion was when Trudie Chalder talked about patients "returning to normal", at the press conference (recorded in a podcast). I've not ever heard White making this slip, because he knows that the 'normal range' does not indicate 'normal health'. He's been far more careful and exact with his wording. So was Trudie Chalder the ringleader and knew exactly what she was doing, or did she not understand the results because she was mislead by others, or was she just not careful enough to cover her tracks like the others were?

PACE Trial press conference podcast
Conference was held at the Science Media Centre on the 17th February 2011
Professor Trudie Chalder stated:
"And if you think about the number of people who get back to normal levels of functioning and fatigue then you see twice as many people in the graded exercise therapy and cognitive behavioural therapy group improving and getting back to normal compared the other two groups."

Bob · Jul 14, 2012

If the protocol is anything to go by, the expected results were so far removed from the reality of the actual results, that I think they must have expected the trial to be more successful. I don't think they realised it was going to be a failure until the results started coming in. And then they panicked, and had to start manipulating the methodology.

Actually, seeing the expected results in the protocol, made me think that these psychiatrists do actually believe what they say. Until that point, I thought that they were just outright con artists, and that they didn't believe a word of what they said.

This is what the protocol says:

11. Sample Size
11.1 Assumptions
At one year we assume that 60% will improve with CBT, 50% with GET, 25% with APT and 10% with SSMC.

But, in fact the improvement rates were an average of 12% for CBT, 13.5% for GET, a negative figure for APT, and approx 61% for SMC (as an average for both primary outcome measures.)

See how the prediction of "10%" for SMC differs to the actual result of "61%" (58% SF-36 PF and 65% Chalder).
And "60%" for CBT is replaced with "12%" for CBT.
The results for SMC and CBT have been reversed.

So I think that they had actually deluded themselves that it was their own psychiatric therapies that were making CFS patients improve in such high proportions, when in fact is was the natural improvements with time, as seen in the SMC group.

user9876 · Jul 14, 2012

Bob said:
This could possibly be the case, as there are some very high status scientists involved in this study, directly or indirectly.

I would see it differently, I think they are diluded and misleading themselves and hence others.

Bob · Jul 14, 2012

Graham said:
Thanks, user 9876, for your comments. I'd be interested in what you find out about Likert scoring. My memory of it (from way back) was that it was set up to produce a more balanced distribution of scores. The aim was, say, to allow a score of 0 to 10, knowing that virtually nobody would pick 0 or 10, so that the distribution of answers was not curtailed and could move either way. Nothing like the Fatigue Scale then!

You know all this already Graham, but I'm just thinking out loud.

To me, bimodal scoring makes sense for the Chalder questionnaire, and Likert scoring doesn't make sense.
I don't know the history, but at a glance it seems that Chalder was sensibly designed to be used with bimodal scoring.
Bimodal is an intuitive and simple scoring system, with less room for subjective variations.
Quite simply, with bimodal, if you feel the same or better than you used to, you score a 'zero', and if you feel worse than you used to then you score a 'one'.
But using Likert scoring brings in loads of subjective variables, and complications.
The extra subjective element is that patients have to accurately distinguish between "more than usual" and "much more than usual" for the results to be consistent.
The extra complication is that feeling "the same as I used to" (my wording) is given a positive score, whereas feeling "worse than I used to" (my own wording) is also given a positive score. So if someone feels exactly the same as they used to (i.e. perfectly well), then their Chalder score would be 11 out of 33. It also complicates things because if someone were to say they feel better than they used to, then this would get differential scoring, but after so many years of ill-health, this must surely be a very subjective measure.
Whereas, with bimodal scoring, if someone feels perfectly well (the same as they used to, or better than they used to), then their score would be zero, with the scores increasing as they feel more unwell.

I notice that there have been some studies that investigate Chalder scoring. I wonder how many of those have used Likert.

Enid · Jul 14, 2012

To a non-scientist it looks like a con - trying to measure a fluctuating illness like ME, knowing nothing (and ignoring) the well documented scientific/pathology findings and applying dubious "scales". "How to make an illness disappear" (Prof Hooper).

Dolphin · Jul 14, 2012

Bob said:
If the protocol is anything to go by, the expected results were so far removed from the reality of the actual results, that I think they must have expected the trial to be more successful. I don't think they realised it was going to be a failure until the results started coming in. And then they panicked, and had to start manipulated the methodology.

Actually, seeing the expected results in the protocol, made me think that these psychiatrists do actually believe what they say. Until that point, I thought that they were just outright con artists, and that they didn't believe a word of what they said.

This is what the protocol says:

11. Sample Size
11.1 Assumptions
At one year we assume that 60% will improve with CBT, 50% with GET, 25% with APT and 10% with SSMC.

But, in fact the improvement rates were an average of 12% for CBT, 13.5% for GET, a negative figure for APT, and approx 61% for SMC (as an average for both primary outcome measures.)

See how the prediction of "10%" for SMC differs to the actual result of "61%" (58% SF-36 PF and 65% Chalder).
And "60%" for CBT is replaced with "12%" for CBT.
The results for SMC and CBT have been reversed.

So I think that had actually deluded themselves that it was their own psychiatric therapies that were making CFS patients get better at such high rates, when in fact is was the natural improvements with time, seen in the SMC group.

I agree there's a good chance they were deluded. However, I don't think the percentage figures you are comparing are the correct ones. When they said GET and CBT, they mean GET+SMC and CBT+SMC - the figures for these weren't 12% and 13.5%. (Alternatively take 10% away from their predictions).

Dolphin · Jul 14, 2012

Bob said:
There's only one or two possible occasions where I've caught them involved in outright misleading phrasing.
One occasion was when Trudie Chalder talked about patients "returning to normal", at the press conference (recorded in a podcast). I've not ever heard White making this slip, because he knows that the 'normal range' does not indicate 'normal health'. He's been far more careful and exact with his wording. So was Trudie Chalder the ringleader and knew exactly what she was doing, or did she not understand the results because she was mislead by others, or was she just not careful enough to cover her tracks like the others were?

PACE Trial press conference podcast
Conference was held at the Science Media Centre on the 17th February 2011
Professor Trudie Chalder stated:
"And if you think about the number of people who get back to normal levels of functioning and fatigue then you see twice as many people in the graded exercise therapy and cognitive behavioural therapy group improving and getting back to normal compared the other two groups."

I wouldn't let Peter White off the hook that easy. There was also this. I think he was the driving force between these newsletters and could be said to be responsible for the contents (with TC & MS):

http://www.pacetrial.org/docs/participantsnewsletter4.pdf

How many patients improved and how many were back to normal?

Around six out of ten patients made an improvement in both fatigue and physical ability after CBT

or GET, compared to four out of ten patients who improved with APT or SMC. The number of patients

returning to normal levels of fatigue and physical function was about three out of ten after

CBT or GET; about twice as many as those who received APT or SMC.

Dolphin · Jul 14, 2012

Bob said:
You know all this already Graham, but I'm just thinking out loud.

To me, bimodal scoring makes sense for the Chalder questionnaire, and Likert scoring doesn't make sense.
I don't know the history, but at a glance it seems that Chalder was sensibly designed to be used with bimodal scoring.
Bimodal is an intuitive and simple scoring system, with less room for subjective variations.
Quite simply, with bimodal, if you feel the same or better than you used to, you score a 'zero', and if you feel worse than you used to then you score a 'one'.
But using Likert scoring brings in loads of subjective variables, and complications.
The extra subjective element is that patients have to accurately distinguish between "more than usual" and "much more than usual" for the results to be consistent.
The extra complication is that feeling "the same as I used to" (my wording) is given a positive score, whereas feeling "worse than I used to" (my own wording) is also given a positive score. So if someone feels exactly the same as they used to (i.e. perfectly well), then their Chalder score would be 11 out of 33. It also complicates things because if someone were to say they feel better than they used to, then this would get differential scoring, but after so many years of ill-health, this must surely be a very subjective measure.
Whereas with bimodal scoring, if someone feels perfectly well (the same as they used to, or better than they used to), then their score would be zero, increasing as they feel more unwell.

I notice that there have been some studies that investigate Chalder scoring. I wonder how many of those have used Likert.

Whatever about making a distinction between "more than usual" and "much more than usual", I think allowing 0 item scores and less than 11 overall, are very problematic, like you say.

PACE Trial and PACE Trial Protocol

Senior Member

Senior Moment

Senior Member

Senior Member

Hibernating

Senior Member

Hibernating

Senior Member

Senior Member

Senior Member

Senior Moment

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member