"Why PACE researchers & their institutions affectation of concern for participants is ..." (P Kemp)

Dolphin · Feb 6, 2016

Feb 5th, 2016

Why the PACE researchers and their institutions affectation of concern for participants is particularly sickening hypocrisy

By Peter Kemp MA

https://docs.google.com/document/d/...hw_5mi0sONeaWqmvU/edit#heading=h.12bp09trj8n6

anciendaze · Feb 6, 2016

I don't know how they get 60% responders, or how he gets 15%. Without data on how many improved, and by how much, you can't deduce this without guesswork. The very distinction between responders and non-responders tells you itself that the distribution is not normal, and therefore cannot be completely described by two parameters. You also have no good way to tell what the authors consider a clinically-significant benefit or harm. Published material leads to the absurdity that a patient with a decreased score can be counted as improved because he was somehow blessed with the authors' attention, while even those who were hospitalized during the trial suffered no harm.

If the authors had no intention of giving this impression, they had best clarify what they really meant.

snowathlete · Feb 6, 2016

Given the Research Council's policy on what qualifies as falsification I think there are already serious concerns that the the PACE authors may be guilty of it. Their seemingly irrational fight to keep the data secret suggests that the data may put this question beyond doubt. There is obviously a clear public interest in the data being released and if it shows falsification then questions will have to be answered.

Bob · Feb 6, 2016

anciendaze said:
I don't know how they get 60% responders, or how he gets 15%. Without data on how many improved, and by how much, you can't deduce this without guesswork.

The 15% figure is the maximum net improvement rate (i.e. the additional proportion of patients who improved) when CBT/GET were added to SMC, for the two primary outcome measures. These outcomes use the 'clinically useful difference' as a post-hoc threshold for improvement. The figures are set out in Table 3 in the 2011 paper. (The rates ranged from 11-15% for CBT/GET when compared with SMC.)

The 60% figure seems to refer a secondary post-hoc analysis outlined in the 2011 paper that measured improvers for both of the primary outcome measures. The exact figure is 59% for SMC+CBT and 61% for SMC+GET, and 45% for SMC alone giving net figures of 14% for CBT and 16% for GET.

anciendaze · Feb 6, 2016

@Bob

You realize you are describing voodoo when you talk about "maximum net improvement rate". You either have 11% or 15% depending on the measure you choose, and you certainly don't have 60%. Cherry-picking measures is frowned on in respectable research. The "net" part is also questionable because a 45% improvement with SMC strongly suggests this was not a valid control. There was no allowance in any of this for the problem of testing familiarity, which can supply that 15% all by itself. This regularly happens with academic testing, where testmanship can outperform substantially more useful learning. Training subjects to be better at filling out forms was not one of the stated goals. Even the 11% might be zero.

Setting post-hoc thresholds is not exactly "best practice" either. This has a strong flavor of drawing bullseyes around bullet holes, another form of cherry-picking.

None of this tells us anything about the criteria for harms. Since objective measures were secondary to subjective ones it is entirely possible for patients to undergo an objective decline while being counted as positive responders. See any problem with that?

The entire evolution from original protocol to post-hoc reanalysis is baffling. I simply don't know the relation between what they said at the beginning and what they presented at the end, and I no longer feel any need to try to trace this and judge the effect of changes. If they wanted people to understand what they did they could have worked to make things clear. Instead matters have become cloudier over time as a succession of strange and evasive responses have been published.

Skippa · Feb 7, 2016

Given the lengths they've gone to thus far...

Serious question, does anyone think that, if (when) forced to release the data, they won't just falsify the whole lot?

snowathlete · Feb 7, 2016

That would be extremely serious and risky for them to do Skippa as they may get found out. Who can say though? They shown they can't be trusted, and they clearly think the real data would be damning else they would have just released it, so it's not impossible, but I'd say it's probably unlikely. More likely they will attempt damage limitation but I think release of the data would attract too many new voices from the science world and those would be impossible to discredit in the same way they attempt to do to current commentators..

anciendaze · Feb 7, 2016

@Bob

I understand the explanation you were trying to make, and I am not attacking you for making it. The problem is the repeated misrepresentation of results by the authors in public statements that are hard to tie to published data. We should not have to speculate on how to interpret numbers. I'm fed up with trying to dig meaningful information out of those publications.

Here's an example where I don't know if I'm misreading them or not. Assume a data set which shows no net effect of therapy at all. The mean stays unchanged, but 45% improve and 45% are worse, with 10% remaining unchanged. Should an intervention which boosts 45% to 50% be reported as curing half? Should the 45% improved in the control group be counted as evidence that the intervention these patients did not receive is worthwhile?

The objective measures were apparently secondary in the sense that they were expected to confirm results obtained through patient responses on questionnaires. It seems they were anticipated to support the findings the authors were most interested in producing. We really don't know why they were there other than to supply the word "objective" to the study. These results were weakened by improper implementation of tests.

What we do know is that the "step test" showed no significant change in physical performance in any group. The change in the walk test for GET was only statistically significant if you overlook having 1/3 of the data missing because patients simply declined the test. Even if statistically significant it was not clinically significant.

At this point we run into one of the mysteries of the study: there was no plan for dealing with the situation where subjective and objective data disagreed, unless the authors planned to simply ignore objective data from day one. If objective data is allowed to confirm a result, it should also be allowed to invalidate it. Anything less is humbug.

This is especially important when we come to the question of harms. Were there patients with improved scores on questionnaires whose physical performance deteriorated? If you can't answer this, and don't even know how rates of hospitalization compare, you can't say much about harm.

Bob · Feb 7, 2016

anciendaze said:
I understand the explanation you were trying to make, and I am not attacking you for making it.

Thanks anciendaze. I guessed that was the case.

anciendaze · Feb 7, 2016

I think the term misrepresentation conveys the central problem with PACE. A long time back I was ready to ignore it, if there had not been a continuing succession of outrageous claims that were not supported by the published data.

My first inkling of the extent to which deception was intended started with the statement that the threshold for entry to the test was being "one standard deviation below the mean" for the U.K. population. Anyone who has experience with statistics based on normal distributions will immediately decide "these people are not very sick". The published diagnostic criteria are not particularly helpful.

I'm still not entirely sure what was wrong with those 640 patients, and strongly suspect some were included by diagnostic error. This would skew mean values and increase variance. It scarcely mattered what the number was, and even the authors didn't take the number too seriously, moving the entry value when the original proved inconvenient.

It required locating another paper and significant effort at interpretation to understand just how far the population data was from a normal distribution. In particular it contained many people over the age of 65, or with heart problems, pulmonary disease or arthritis -- all of which were screened out of the patient sample. Only after you have checked this can you understand that patients were being compared with people far out on the curve who had undeniably serious illnesses that were simply better explained. To treat this as a single normal distribution you had to decide that sick people were fundamentally the same as healthy people.

There was no need for an official lower bound on patient performance for the simple reason that patients beyond some point would not be able to participate in the preliminary investigations needed to get into the study. Most included patients were a short step from hospitalization, though those brought in with the revised criteria would be grouped nearer the upper threshold.

From that point on my opinion of the study kept getting lower with each new discovery of a contentious question which had been bypassed.

Sean · Feb 7, 2016

anciendaze said:
At this point we run into one of the mysteries of the study: there was no plan for dealing with the situation where subjective and objective data disagreed, unless the authors planned to simply ignore objective data from day one. If objective data is allowed to confirm a result, it should also be allowed to invalidate it. Anything less is humbug.

Just worth repeating.

anciendaze · Feb 8, 2016

@Sean

I'd like to expand on that by noting that there were originally 7 planned measures by which to judge success: 1) total activity measured with Actimeters; 2) the step test; 3) the walk test; 4) perceived fatigue; 5) activity reports; 6) employment data; 7) medical claims. The chosen primary measures, 4 and 5, were both done through questionnaires, for which there is abundant evidence in relevant literature that manipulation is possible. (Note that fatigue was simply assumed to be entirely subjective.)

In the end 1 was simply dropped, results from 2 were delayed until the paper on long-term benefits appeared, 3 was improperly implemented, while 6 and 7 were delayed until a promised paper on economic benefits appeared. Thresholds were also adjusted post hoc. The authors were honest in the sense of eventually publishing negative results, though with as much positive spin as possible. Delaying negative results is a tactic any competent politician will understand. You ride the crest of a spurious boost provided by isolated positive news as long as possible.

Cherry-picking your favorite measure out of 7 allows plenty of room for generating spurious results due to random fluctuations. I will note at this point that all objective measures were negative, with the possible exception of a walk test which omitted patients who didn't feel like taking it. Even the weak results in that case revealed patients who remained quite seriously ill after therapy. Those measures which yielded positive results were also most subject to manipulation. People reading press reports would never guess this.

There is a definite pattern here, and it is not pretty.

Added: here's breaking news on the reliability of questionnaires in assessing activity in a group known for reduced performance.
Bottom line: not very reliable.

anciendaze · Feb 8, 2016

The discrepancy between what the numbers say and what the authors say continues to impress me. Even though I know this is no more than reading tea leaves, in the absence of actual released data, bear with me while I examine another implication of the argument about "responders".

Bob has argued above that the percentage of responders was actually between 11% and 15%. Percentages give a false impression of precision, so let's convert this to between 1 in 6 and 1 in 9 of the participants in each arm of the study. 640 subjects provide 160 in each of 4 arms.

If only 1/9 of these responded, we would have 17 or 18 on which the results were based. But wait, there's more! Recall that 1/3 simply declined the walk test. This says objective results from GET were based on only 12 individuals. This is a far cry from the 640 touted in the news.

At this point it should be obvious that the effect of a small number of outliers is important. If only 3 individuals who were misdiagnosed were included, and fully recovered, this would be enough to account for a substantial change in mean value. Likewise, a small number of negative outliers could change results substantially, which bears on the question of possible harms. Excluding data from a few negative outliers could also have substantial effects on mean values.

Without seeing complete data sets we can't decide if any of the claims made were valid. In any case, it appears that the confidence projected by the authors, who started with 3158 patients referred, then whittled this down, ends up being based on a group which could fit in a modest-sized room. Absent considerable hype we wouldn't even be having this discussion.

Added: I don't want anyone to get into serious arguments defending these specific numbers, there are simply too many assumptions involved. What is important is the extent to which publicized results could depend on such a small group of patients that we could end up arguing about individual cases. This is not a valid basis for setting national policy.

An inference which is less obvious is that the somewhat mysterious process that reduced the initial intake from 3158 to 640, supplying 160 in each arm, assumes great importance if a handful of patients could skew the results. The assumption that these result apply to all such patients depends on unproven claims of diagnostic homogeneity. I'll say more about the consequences later.

Sean · Feb 9, 2016

anciendaze said:
(Note that fatigue was simply assumed to be entirely subjective.)

Indeed, they more-or-less define it – and the entire syndrome – as such. How convenient.

There is a definite pattern here, and it is not pretty.

I wouldn't want to be in their shoes when the rest of the world twigs to what is going on here and the implications for all of society. That is going to be ugly as well.

anciendaze said:
But wait, there's more! Recall that 1/3 simply declined the walk test.

Was that at outcome, or including baseline? Was there a big increase in the decline rate between the two?

anciendaze · Feb 9, 2016

Sean said:
...Was that at outcome, or including baseline? Was there a big increase in the decline rate between the two?

Some declined at baseline and some at outcome. Either way they provided no data about the effect of therapy. There was no increase in the rate at which patients declined the test. The problem was that the authors simply assumed the missing data would not reduce the score, which is equivalent to assuming the conclusion that nobody deteriorated as a result of therapy. I might stipulate that some patients got better and some got worse, but that those who got worse were not counted. It scarcely matters because the touted improvement was pitiful.

Sean · Feb 9, 2016

anciendaze said:
It scarcely matters because the touted improvement was pitiful.

Well, exactly.

They are splitting hairs about very small effect sizes, for a small proportion of patients, even if they were legitimate effects.

Not to mention that this was after swinging everything possible in their favour, but they still could only come up with a barely measurable effect on subject self-report measures.

anciendaze · Feb 9, 2016

Now I'm going to make a statement which will anger some people here. I don't doubt there were something like 20 people in the original intake of 3158 who would benefit from counseling and exercise. Even if these were divided between four groups, the 5 in each group would be enough to produce statistically significant changes in measures of outcome. What about the other 3138?

To say that diagnostic criteria for CFS are contentious is understatement. The authors have had considerable influence on these in the UK, and still found a rate of 30% misdiagnosis by other UK doctors in a different paper. Nobody has studied the rate at which the authors err, but even if their rate of misdiagnosis were as low as found among pathologists looking at tissue samples it would still be enough to provide the weak results published.

More serious consequences arise when people with serious diseases are misdiagnosed as having a primary psychological problem called CFS. There is a great deal of anecdotal evidence available saying that such errors do take place.

I'm going to talk about one peculiar disease which I know something about, simply because I know such a sufferer. Most people reading this will not have it, but that is not why I presenting it here. It is not trivial, having substantial rates of morbidity and mortality. Even if the genetic markers are absent, there are clinical tests that unambiguously show it is not a psychological problem. It can be misdiagnosed as "somatization" or "a conversion disorder" or even "catatonia". The name for this group of diseases is periodic paralysis, and it is quite rare, affecting about 1 in 100,000 live births. Clear signs of onset often don't appear to age 18 or so, making it harder to recognize as a genetic problem. I've discussed this disease with several competent doctors who had never heard of it. I'm not at all sure how they would know they had such a patient. Most known patients went many years without a correct diagnosis.

Patients with this disease tend to have an abnormal response to exercise or carbohydrate metabolism. Because this can be diagnosed as somatization they really could show up in a CFS cohort. Sloppy diagnostic criteria can sweep up a number of rare diseases into a single category, patients with a very different rare autoimmune disorder like IgG4 related disease have also been told they have CFS. If we assume that 99% of the general population do not have ME/CFS we could estimate that the concentration of unrecognized rare diseases in a CFS cohort could be 100 times the rate estimated in the general population. There might have been 3 such patients in the initial intake for PACE. There are a lot of other rare diseases with mysterious symptoms so that it would be quite difficult to estimate the total number.

The question of how you avoid killing such patients who end up with the wrong diagnosis is impossible to answer from the PACE literature.

alex3619 · Mar 19, 2016

anciendaze said:
Nobody has studied the rate at which the authors err

Not quite. In 2000, iirc, in the British Medical Journal, a doctor wrote a letter to the editor. In it he discussed that he was a locum for one Simon Wessely. In the several weeks he was there he rediagnosed half of the patients ... with things like (and I do not recall the list, this may be wrong) asthma and heart disease. I also no longer recollect if these were CFS patients ... they might have been.

This is not a formal study, and involved only a small subset of patients, but it does demonstrate an error rate, which is only to be expected.

helen1 · Mar 19, 2016

alex3619 said:
he rediagnosed half of the patients

Dr Hyde is well known for finding the same thing, that diagnosed CFS patients have other illnesses. He told me that in up to 40% of people consulting him he finds they have something else.

"Why PACE researchers & their institutions affectation of concern for participants is ..." (P Kemp)

Dolphin

Senior Member

anciendaze

Senior Member

snowathlete

Senior Member

Bob

Senior Member

anciendaze

Senior Member

Skippa

Anti-BS

snowathlete

Senior Member

anciendaze

Senior Member

Bob

Senior Member

anciendaze

Senior Member

Sean

Senior Member

anciendaze

Senior Member

anciendaze

Senior Member

Sean

Senior Member

anciendaze

Senior Member

Sean

Senior Member

anciendaze

Senior Member

alex3619

Senior Member

helen1

Senior Member