• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

BMJ comments on new PACE trial data analysis

Countrygirl

Senior Member
Messages
5,468
Location
UK
@Countrygirl that's a shot of reality, isn't it? :(

The ME specialist I saw advised me to do nothing when I started feeling better again, but never explained WHY, so I left his office thinking now he's gone off the deep end :p

I didn't realise just how realistic it was though at the time. I was put in a room with one other occupant who had ME and she was given strict instructions not to let me stray into the next room as the doctor did not want me to be shocked. I soon found out why when I was asked by a nurse to leave my bed and sit with the poor girl next door so she could leave her. That was a rude awakening and an introduction to the world of very severe ME. The poor young woman was in a coma and not expected to live and was hooked up to life-support systems. I was given instructions to ring the emergency bell if there was any change. I never heard whether she survived as I ran out of funds and had to leave. It was quite a shock to learn just how serious this illness is.
 

MeSci

ME/CFS since 1995; activity level 6?
Messages
8,231
Location
Cornwall, UK
The ME specialist I saw advised me to do nothing when I started feeling better again, but never explained WHY, so I left his office thinking now he's gone off the deep end :p
Did he say for how long you should not do anything?
 

Mij

Senior Member
Messages
2,353
@MeSci no he did not. I think he said this in response to my overzealous attitude to getting myself back into shape.
 

lansbergen

Senior Member
Messages
2,512
It used to be usual to have a period of 'convalescence' after a significant illness.


It sure was practice in verterinary medecine to try to keep the animals as restful as possible during infecting and the recovering stage.
 

Sean

Senior Member
Messages
7,378
re: the placebo analysis cited above.

I have read it more than once, and it is subtle and needs careful use, and I don't claim to fully understand it yet. But I think it is also one of the more important general findings for us, in that it places serious constraints on what can be claimed about the clinical implications of the placebo effect.

http://www.ncbi.nlm.nih.gov/pubmed/20091554
 

Woolie

Senior Member
Messages
3,263
All symptoms are subjective. Even if you have a broken leg.
@Jonathan Edwards, the problem for me is not the use of subjective measure per se, but the small effect sizes they got using these. The placebo effect is estimated to be much larger in self-report than in more objective measures, such as ratings by blinded observers (in fact, some estimate the placebo effect to be close to zero in such situations i - see article below).

So in an unblinded study such as this, you would need to demonstrate a really substantial effect of the treatment on self-rated scores to be certain you're not just measuring placebo effects.

Doing a rough calculation of effect sizes from the original PACE paper, they are at the low end of the placebo range, as estimated in the two metanalyses below. These effects are therefore totally in line with what we'd expect from placebo responding alone.

The PACE study looked pretty well designed to me from the original protocol. But they did chance it by not trying to control more for placebo responding. Could have still worked out for them if the self-rated effects were big, but sadly, they weren't. Then what followed (as is unfortunately common in psychological studies) is a lot of exaggeration of significant results and selective reporting of outcomes.

I would like to see better reporting and quality evaluation standards brought in for behavioural studies (CONSORT scores, etc.). I think that would help raise quality a lot.

Hróbjartsson A and Gøtzsche PC. Placebo interventions for all clinical conditions. Cochrane Database Syst Rev 2010; 1. http://summaries.cochrane.org/CD003974/placebo-interventions-for-all-clinical-conditions.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
It seems strange to me that in an experiment a single outcome is preselected and a test is done to see if that particular variable from two different data groups differ. Rather than saying what evidence do we have in favour of the given hypothesis and what against (in this way we can compare a number of different hypothesises rather than saying data from two groups is the same or not the same).

The problem with secondary outcomes is (and I assume Bonferoni) is that some may just show and effect that isn't there. But shouldn't we use all the data we can collect and combine the evidence (both in favour and against). I would go as far as arguing that it is interesting to look at what evidence does and doesn't support a hypothesis and ask why.

Ultimately its a frequentist vs baysian argument and I guess I'm a believer in baysian inference which too me always has seemed to have more solid foundations and be closer to logical reasoning, Whereas I find significance testing arguments a bit counter intuitive when I've tried to understand the equations.

Still I guess for a medical trial to be accepted a frequentist approach needs to be taken.

I think the problem is that there are always a very large number of correlations you can look for in a set of data from a trial. The 'which explanation fits best' is in fact very widely used for pilot trials looking at dose response curves etc. In cancer it is traditional to 'explore' options with small pilot studies using a Bayesian approach and then set up a formal study based on what looks the most plausible choice of protocol.

For cancer it is rather easy because end points are objective and straightforward. For something like ME it is all much more difficult. I don't actually think the study on placebo effects is relevant here. It is a meta-analysis of situations that by and large are much less problematic than ME. What if 50% of patients in PACE did not even have ME - were just people who felt tired and wanted to talk to someone?

It gets very complicated if we want to choose the best hypothesis to fit the data. For a start we have to consider the hypotheses that 0%, 20%, 40%, 60% or 80% of the patients actually had the right disease (assuming we are agreed what that is!) We have to consider the hypotheses that what mattered was just seeing s therapist, or that it was component 1 of the CBT instructions, or 2 or 3 or 2 plus 6 or whatever, that was having an effect. (Component 2 might turn up in the GET instructions too but not 3 etc etc)

The practical side of the problem is very neatly demonstrated by the rituximab study in Norway. They stuck to protocol and took improvement in fatigue at 3 months as their primary endpoint, as stipulated in advance. It failed to show a difference. However, there was a difference in fatigue at 6 months. It became clear that 6 months was the more appropriate end point for biological reasons but the authors correctly pointed out that their primary end point had failed. This is essential because considering the number of things measured and the number of time points used if they had chosen the one that looked best rather than sticking to their advance choice they could almost certainly have found something that looked positive - just because of noise in the data.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
@Jonathan Edwards, the problem for me is not the use of subjective measure per se, but the small effect sizes they got using these. The placebo effect is estimated to be much larger in self-report than in more objective measures, such as ratings by blinded observers (in fact, some estimate the placebo effect to be close to zero in such situations i - see article below).

So in an unblinded study such as this, you would need to demonstrate a really substantial effect of the treatment on self-rated scores to be certain you're not just measuring placebo effects.

Doing a rough calculation of effect sizes from the original PACE paper, they are at the low end of the placebo range, as estimated in the two metanalyses below. These effects are therefore totally in line with what we'd expect from placebo responding alone.

The PACE study looked pretty well designed to me from the original protocol. But they did chance it by not trying to control more for placebo responding. Could have still worked out for them if the self-rated effects were big, but sadly, they weren't. Then what followed (as is unfortunately common in psychological studies) is a lot of exaggeration of significant results and selective reporting of outcomes.

I would like to see better reporting and quality evaluation standards brought in for behavioural studies (CONSORT scores, etc.). I think that would help raise quality a lot.

Hróbjartsson A and Gøtzsche PC. Placebo interventions for all clinical conditions. Cochrane Database Syst Rev 2010; 1. http://summaries.cochrane.org/CD003974/placebo-interventions-for-all-clinical-conditions.

I agree that effect size is important. When effects get very large it becomes difficult to attribute them to placebo in many cases. However, as indicated above, I do not think the Hrobjartsson and Gotsche analysis is actually of any use to us for ME trials. Because of uncertainties in recruiting and because fatigue is such a difficult thing to pin down and perhaps the most motivation dependent of all symptoms my view is that placebo effects in other contexts are uninformative. If you can get fatigue to disappear in MS patients with relaxation therapy (apparently) without altering their neuropathology then it seems likely that the placebo effect of talking therapies can be very strong. In fact Knoop suggests that CBT just IS a placebo effect.

In this context the PACE trial cannot begin to be well designed. You simply cannot start out with an unblinded therapy and then use a subjective end point. Nobody in a branch of medicine other than psychiatry would take that seriously and it is about time the psychiatrists realised they should fall in line with a scientific approach.

And I honestly suspect that very little of this has anything to dow the classical conception of the placebo effect of making the patient feel better. I think it mostly has to do with patients saying what they think the therapist wants to hear. I would get that constantly in my clinic with arthritis patients. I would ask ' how did you get on with the tablets' and they would say ' oh yes, they are very good doctor' and I would say 'are you much better then, has the swelling gone down' and they would see 'yes, at least I think it has a bit' so I would say 'not that much then, so maybe that tablets did not really help?' and they would say 'probably not, your'e right'. As soon as the patient realised that I did not mind if my treatment was no good they were honest. I strongly suspect that this does not apply in most trials - especially as patients are frightened of being kicked out of the clinic if they do not help with trials (as they are in many trial based units).
 

A.B.

Senior Member
Messages
3,780
And I honestly suspect that very little of this has anything to dow the classical conception of the placebo effect of making the patient feel better.

Which itself is probably a myth. In the original paper on the placebo, it was assumed that any improvement in patients taking a placebo was due to the placebo itself.

The powerful placebo effect: fact or fiction?

Placebo effect and placebo concept: a critical methodological and conceptual analysis of reports on the magnitude of the placebo effect.

The authors conclude that the literature relating to the magnitude and frequency of the placebo effect is unfounded and grossly overrated, if not entirely false. They pose the question whether the existence of the so-called placebo effect is itself not largely-or indeed totally-illusory.
 
Last edited:

mango

Senior Member
Messages
905
So, about that hypothetical study of CBT for car engine problems...

psychobabblers.jpg


;)
 

MeSci

ME/CFS since 1995; activity level 6?
Messages
8,231
Location
Cornwall, UK
@MeSci no he did not. I think he said this in response to my overzealous attitude to getting myself back into shape.

Good really. I suspect that most of our doctors advised the opposite, as mine did - encouraged futile and harmful attempts to get fit. :(
 

user9876

Senior Member
Messages
4,556
The practical side of the problem is very neatly demonstrated by the rituximab study in Norway. They stuck to protocol and took improvement in fatigue at 3 months as their primary endpoint, as stipulated in advance. It failed to show a difference. However, there was a difference in fatigue at 6 months. It became clear that 6 months was the more appropriate end point for biological reasons but the authors correctly pointed out that their primary end point had failed. This is essential because considering the number of things measured and the number of time points used if they had chosen the one that looked best rather than sticking to their advance choice they could almost certainly have found something that looked positive - just because of noise in the data.

It feels strange that there is a random factor around how someone guesses an initial prediction as to how well a possible treatment is perceived. The current system seems to have moved from the question of 'is there a counter example for the null hypothesis on one variable' to 'can those designing the trial guess the variable with the counter example to the null hypothesis prior to the start of the trial'. In trials where there is more experience such as CBT/GET then those planning trials can choose the outcomes based on measures that seemed to work before and discard ones that didn't work. With Crawley's Smile trial she seemed to have done a small trial which followed by a larger one but with changes to the outcomes based on the smaller trial.

Shouldn't there be a biological/psychological model to interpret all the data and that be used to understand what data fits the model and what doesn't. If a trial measures as much as possible say 10 variables then if they don't correlate well together to support the trial outcome don't we need a good explanation as to why they don't. I guess there may be additional measures which try to separate between different potential mechanisms by which a treatment may work. Here I'm assuming instead of pre-picking a variable and saying that is key what needs to be done is to pre-pick potential explanations and see which fit the data. If timings are key as seems to be the case in the Rituximab trials then shouldn't a dynamic model be specified that includes sample time as a variable and tries to interpret all the data in light of that.

Shouldn't the question be can a coherent hypothesis be put together to explain all the data including why outlying data doesn't fit the trend based on mechanism we understand or that proposes relationships that can be tested with further experimentation. Where some measures improve but others don't but the trial hypothesis suggests that they should then doesn't that invalidate the hypothesis. It may mean a treatment is still useful in that it effects some aspects of a disease but it would suggest that the way a given treatment is working is not understood.

I would prefer to look at the full data and have that used to give an explanation of how a treatment works/ doesn't work or partially works. That would of course mean publishing all data and not allowing chosen crumbs of data to be published in papers over a number of years. These days there are plenty of available modelling techniques that can try to link variables including looking for leading and lagging variables. It used to be that some were computationally too expensive but these days that is no longer the case. Although in fitting a model it needs to be used to explain data.

In this pace mediation paper they basically fit two linear regression models find a bit of correlation and call that mediation. They don't give any reason to believe the correctness of their model (i.e. linear relationships between variables, and the other underlying linear regression assumptions) nor do they try to invalidate their hypothesis base on testing concurrent samples of their mediator and outcome variables as opposed to picking later outcome variables. So all they seem to have established is some vague correlation between a couple of variables. I suspect what is more interesting is that there are not more correlations between mediators variables and outcome variables because the lack of a relationship should say something. What I don't see is any explanatory value behind their analysis.

In PACE and other CBT trials we seem to have the situation where self reported scales improve a bit but activity ones don't which I feel is an interesting result. But we don't actually know because the data vectors across outcomes (primary and secondary) seem not to have been analyzed. We don't know whether those with the biggest improvement in one primary outcome also had the biggest improvement is a second primary outcome or how they relate to the secondary outcomes. Hence we don't know about whether the different outcomes are self supporting or contradictory. I guess I find this worrying.

Edit (addition):
Something like the sf36 physical function scale could be viewed (and I would argue should be) as a composite outcome measure of 1 part how easy personal care is, 2 parts how easy climbing stairs is, 3 parts how easy walking is, and 2 parts something vague about exercise, 1 part bending, kneeling or stooping, and 1 part lifting stuff. (i.e. 10 questions but about different physical activities).
 
Last edited:

Mij

Senior Member
Messages
2,353
Good really. I suspect that most of our doctors advised the opposite, as mine did - encouraged futile and harmful attempts to get fit. :(

It's criminal :mad: why not just admit they don't understand this illness. I still didn't listen to my doctor and continued to exercise until I finally realized I was getting worse.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
It feels strange that there is a random factor around how someone guesses an initial prediction as to how well a possible treatment is perceived. The current system seems to have moved from the question of 'is there a counter example for the null hypothesis on one variable' to 'can those designing the trial guess the variable with the counter example to the null hypothesis prior to the start of the trial'.

Something like the sf36 physical function scale could be viewed (and I would argue should be) as a composite outcome measure of 1 part how easy personal care is, 2 parts how easy climbing stairs is, 3 parts how easy walking is, and 2 parts something vague about exercise, 1 part bending, kneeling or stooping, and 1 part lifting stuff. (i.e. 10 questions but about different physical activities).

It is all to do with making sure people do not cherry pick data when the effect is around the limit of noise level. If ME symptoms fluctuate in time and also shift from one thing to another then a real benefit could be masked by chance shifts in symptoms. There is no 'guessing the right measure' really. You might be guessing which measure shows up because by luck there is less noise for that one but it could have been a different measure. All this really comes out in the wash if the effect is big enough to stick out above the noise. Then it does not really matter which measure you pick - they should all show an effect.

The other difficulty is that although it might seem there should be one hypothesis beforehand in practice there will be many. Fluge and Mella were testing the hypothesis that rituximab helped ME symptoms. They discovered that it might have been better to ask if it helped ME symptoms with the time course typical of rituximab - which is a bit unexpected because it is delayed.

I quite see your puzzlement, and the reality is that your Bayesian approach can be, and is, applied post hoc to influence how seriously people take a result. But if you cannot pick a best endpoint and get a statistically significant result on it, all things being equal, it is a fair guide to the result being inconclusive at best - and needing repeating if taken seriously at all.
 

Esther12

Senior Member
Messages
13,774
But if you cannot pick a best endpoint and get a statistically significant result on it, all things being equal, it is a fair guide to the result being inconclusive at best - and needing repeating if taken seriously at all.

But the 'best' endpoint for those wanting to promote the efficacy of CBT/GET may be different to what patients think is the 'best' endpoint. Researchers being able to pick an outcome that indicates efficacy in a nonblinded trial does not mean that other outcome measures are less important.

PACE dropped actometers as an outcome measure after previous CBT trials had found CBT led to an improvement in questionnaire scores and not in measured activity levels. For researchers with an interest in claiming that the treatments they have developed are effective, they will be able to learn which outcome measures are best suited to reflecting their views, and which are most likely to undermine them. When using nonblinded trials this leaves a lot of room for misleading results.

User9876 mentioned the changes made to the primary outcome measures used in the SMILE trial after they had looked at early results - the interests and biases of researchers can play an important role in their selection of primary outcome measures. I think that there is some value to just collecting a range of outcome measures, and then making this anonymised data publicly available for it's meaning to be discussed and debated by as wide a range of people as possible. I think that patients would often have more to contribute to this than those researchers who have developed the interventions being tested.

Before treatments were seen as anything other than experimental we would then need replications with pre-specified end-points, but I don't think we're there with CFS treatments yet.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
But the 'best' endpoint for those wanting to promote the efficacy of CBT/GET may be different to what patients think is the 'best' endpoint. Researchers being able to pick an outcome that indicates efficacy in a nonblinded trial does not mean that other outcome measures are less important.

But that is an issue about the difference between doctors' objectives and patients' objectives rather than the statistical issue we were debating. In the end I doubt it will turn out to have been a best endpoint because it will become accepted fairly soon that the trial is uninterpretable - particularly if the authors continue to produce further data that shoots their own hypotheses in the foot.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
How common are symptoms shifts in ME, and in autoimmune diseases?

Judging by stories people tell on PR - which is my main source - symptom shifts seem common in ME over a period of years. People often note that a symptom was not there at the beginning and only became evident later, even if other things got a bit better.

In autoimmunity symptom shifting varies a lot. In lupus it is typical. Each flare will be slightly different from the last in its range of features, even if each patient tends to use a particular repertoire of features. In RA symptom shifts used to be well recognised, with patients moving into new phases with vasculitis or nodules or pericarditis and different joints coming to dominate the picture. In thyroid disease hypothyroidism can shift to hyperthyroidism or vice versa. In other conditions things may be more stereotyped.