• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

An open letter to Psychological Medicine, again! by D Tuller et al

Messages
44
That's not quite right, as patients were not required to not fulfil any aspect of the trial entry criteria (which was: assessed as having Oxford, and SF36-PF of 65 or under, and Chalder Fatigue bimodal score of 6 or under), but they were required to not fulfil all of them. So a patient could still report a decline in SF36-PF score from baseline and be classed as recovered, but they could not fulfil all aspects of the trial entry criteria and be classed as recovered.

Simple!

Thanks for the replies, Esther12 and others. I am still not sure if I understand the issue correctly or not. I wonder if the following reasoning is correct: Only patients with both fatigue and physical function in the normal range were considered in the composite assessment of recovery at the end of the trial, so they had to have a physical function score of 60 or greater at the end of the trial, and for these patients to have deteriorated they would have had to have started the trial with a score of 65. However, they could not be included as recovered unless simultaneously with their deteriorated physical function they improved their fatigue score from the trial entry criterion of greater than or equal to 6 on the Chalder Fatigue Questionnaire bimodal scoring to the recovery threshold of 18 or less with Likert scoring and also rated themselves as much or very much better. But as a patient reporting a decline in baseline physical function even from 65 to 60 would presumably not rate themselves as much better or very much better on the CGI (and would most likely still have disabling fatigue on the CFQ), they would not be classified as meeting the composite recovery criteria or counted as recovered? Or are Tuller and other critics only speaking about the individual outcome criteria in isolation and ignoring the fact that the trial recovery definition is made up of several combined criteria?
 

Esther12

Senior Member
Messages
13,774
Isn't that data handy?

I was just about to say that the actual data from PACE shows that the way people complete these sort of questionnaires can be unpredicatable. I'm rubbish with spreadsheets, but did have a little look at it, and there were lots of examples of weird looking results. tbh, it wouldn't surprise me if people were just a bit lazy with the questionnaires.

Also, even if only those reporting improvements in SF36-PF scores also rated themselves much or very much better, the fact that participants could potentially decline on the SF36-PF and still be classed as recovered still illustrates a problem with the way they defined recovery.
 
Messages
44
10 people in the trial deteriorated on the SF-36 physical functioning subscale but rated themselves as much better or very much better on the CGI.
That's surprising, although the number is still very small in comparison to the number of participants overall. I am surprised there were any though .Is there any explanation for this? Is it that their fatigue was much improved despite worse physical function, or something else?
 

Dolphin

Senior Member
Messages
17,567
That's surprising, although the number is still very small in comparison to the number of participants overall. I am surprised there were any though .Is there any explanation for this? Is it that their fatigue was much improved despite worse physical function, or something else?


trialarm cfqlsov0 cfqbsov0 pcfqls52 pcfqbs52 dgiq52F pfov0 p_pfov52 pgiq52F wtmts.0 wtmts.52 o_ov52cor
3 24 9 19.00 8.00 3 55 35.00 -20.00 2 556 650 0
2 33 11 33.00 11.00 2 20 0.00 -20.00 2 315 311 1
4 28 9 19.00 6.00 1 55 45.00 -10.00 1 305 367 1
2 33 11 33.00 11.00 2 40 30.00 -10.00 2 360 #NULL! 1
2 27 10 8.00 2.00 1 45 35.00 -10.00 2 412 380 0
3 22 11 15.00 4.00 #NULL! 55 45.00 -10.00 2 367 377 1
2 31 11 16.00 5.00 1 55 50.00 -5.00 1 377 425 0
3 32 11 19.00 8.00 3 50 45.00 -5.00 2 520 570 0
2 32 11 28.00 11.00 2 35 30.00 -5.00 2 341 321 1
2 26 11 25.00 11.00 2 35 30.00 -5.00 2 200 319 1
 

Barry53

Senior Member
Messages
2,391
Location
UK
That's surprising, although the number is still very small in comparison to the number of participants overall. I am surprised there were any though .Is there any explanation for this? Is it that their fatigue was much improved despite worse physical function, or something else?
It is actually an excellent illustration of how vulnerable self-reporting outcomes are, when devoid of sanity checks, to prevailing-perception induced skew. It is all based on prevailing subjective perceptions, not backed up by objective measures.

I think it also indicates that CBT can in some cases, as with many other physical diseases, help as a coping strategy, and some people may feel better with how they are coping with their problems. The questionnaires would then most likely conflate that self-perceived coping effect, misrepresenting it as real improvement.

And even though the highlighted number of participants here is small, it clarifies very nicely that there will also be a good many other participants whose reported outcomes are also going to be skewed.
 
Last edited:

Barry53

Senior Member
Messages
2,391
Location
UK
trialarm cfqlsov0 cfqbsov0 pcfqls52 pcfqbs52 dgiq52F pfov0 p_pfov52 pgiq52F wtmts.0 wtmts.52 o_ov52cor
3 24 9 19.00 8.00 3 55 35.00 -20.00 2 556 650 0
2 33 11 33.00 11.00 2 20 0.00 -20.00 2 315 311 1
4 28 9 19.00 6.00 1 55 45.00 -10.00 1 305 367 1
2 33 11 33.00 11.00 2 40 30.00 -10.00 2 360 #NULL! 1
2 27 10 8.00 2.00 1 45 35.00 -10.00 2 412 380 0
3 22 11 15.00 4.00 #NULL! 55 45.00 -10.00 2 367 377 1
2 31 11 16.00 5.00 1 55 50.00 -5.00 1 377 425 0
3 32 11 19.00 8.00 3 50 45.00 -5.00 2 520 570 0
2 32 11 28.00 11.00 2 35 30.00 -5.00 2 341 321 1
2 26 11 25.00 11.00 2 35 30.00 -5.00 2 200 319 1
There is an extra column in the data here @Dolphin, the -ve values. What are they? Just twigged - the SF-36 differences.
 

slysaint

Senior Member
Messages
2,125
It is all based on prevailing subjective perceptions, not backed up by objective measures.
I am probably stating the obvious here, but has anyone scrutinised how the Chalder Fatigue Scale was created in the first place? (1993) It seems to me, like most of the studies that use it, that it was tweaked until it provided the results they wanted.
I found this paper where they were assessing it's useability for fatigue in M.S.;
Chalder Fatigue Questionnaire-MS - King's College London

https://www.google.co.uk/url?sa=t&r..._.docx&usg=AFQjCNH3g1GFld7eIk3ua6XkTm3lgKxtAg
 

Esther12

Senior Member
Messages
13,774
I am probably stating the obvious here, but has anyone scrutinised how the Chalder Fatigue Scale was created in the first place? (1993) It seems to me, like most of the studies that use it, that it was tweaked until it provided the results they wanted.
I found this paper where they were assessing it's useability for fatigue in M.S.;
Chalder Fatigue Questionnaire-MS - King's College London

https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwi4-834oPTSAhXMKcAKHWEkCKkQFggtMAI&url=https://kclpure.kcl.ac.uk/portal/files/37819671/PURE_Revised_MS_Fatigue_CFA_final_submitted_MS_JUNE_2015_v4_.docx&usg=AFQjCNH3g1GFld7eIk3ua6XkTm3lgKxtAg

There was some Q&A with Wessely where he talked about creating it - sounded like he just jotted some questions down on the back of a napkin.

Found it:

There was no instrument available to measure subjective fatigue, so I simply invented one, which would later get modified into the Chalder Fatigue Scale, which also became a citation ‘hit’. And basically that was that.

http://www.meassociation.org.uk/201...-fatigue-syndrome-journal-article-march-2012/
 

user9876

Senior Member
Messages
4,556
I am probably stating the obvious here, but has anyone scrutinised how the Chalder Fatigue Scale was created in the first place? (1993) It seems to me, like most of the studies that use it, that it was tweaked until it provided the results they wanted.
I found this paper where they were assessing it's useability for fatigue in M.S.;
Chalder Fatigue Questionnaire-MS - King's College London

https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwi4-834oPTSAhXMKcAKHWEkCKkQFggtMAI&url=https://kclpure.kcl.ac.uk/portal/files/37819671/PURE_Revised_MS_Fatigue_CFA_final_submitted_MS_JUNE_2015_v4_.docx&usg=AFQjCNH3g1GFld7eIk3ua6XkTm3lgKxtAg

I think it is a terrible questionnaire. We should not call it a scale because it is not. It has a random series of questions it treats physical fatigue as more important than mental fatigue because of the way questions are chosen. In their paper they acknowledge that mental and physical fatigue can vary differently (that's what their PCA analysis says) and so the scale doesn't even necessarily go down as overall fatigue reduces - the impact of this is the only statistic that should be quoted is the mode.

Then there are the two marking schemes. PACE say that they changed from a bimodal to a likert scheme to increase accuracy. But that is highly misleading - it is not like measuring in mm rather than cm. They are different marking schemes and with the same answer set under one scheme patient A may be more fatigued than patient B but under the other the opposite could be true. In effect this means that one, or the other or both are not linear scales and so it is not valid to quote the mean or SD. There is no evidence I have come across to suggest which marking scheme is more linear than the other and hence which is more valid. Within the PACE and FINE data there are patients who both improved and got worse depending on the marking scheme.

The questions are confusing in terms of they ask about change rather than state and this is likely to lead to inconsistencies over time.

I think the judgement of anyone using such a scale has to be called into question especially statisticians. I would also say the judgement of any reviewers and funding agencies who allow such a questionnaire to be used is also highly suspect.
 

Barry53

Senior Member
Messages
2,391
Location
UK
Then there are the two marking schemes. PACE say that they changed from a bimodal to a likert scheme to increase accuracy. But that is highly misleading - it is not like measuring in mm rather than cm. They are different marking schemes and with the same answer set under one scheme patient A may be more fatigued than patient B but under the other the opposite could be true. In effect this means that one, or the other or both are not linear scales and so it is not valid to quote the mean or SD. There is no evidence I have come across to suggest which marking scheme is more linear than the other and hence which is more valid. Within the PACE and FINE data there are patients who both improved and got worse depending on the marking scheme.
Reading this makes me realise: Just talking about medical trials generally, if a trial is to be correctly peer reviewed, then should that not mean that all the component parts that contribute should, themselves, have been verified or peer reviewed in some way? It is only as strong as the weakest link. So if a trial is going to measure outcomes using Method X, then surely that must mean Method X itself has to have been fully validated, else that part of the outcome cannot itself pass a peer review ... surely? You cannot just say "Well, we invented Method X because it suited us to use it, and because we are such clever bar-stewards we must be right so don't argue?!".

So the fatigue score should have been peer reviewed ages ago, before it was ever allowed to be used in such a lives-changing clinical trial. Other scientists and mathematicians should have had their chance to identify the flaws in it, so it could be honed into something viable. Same for any other methods or practices employed.

This whole PACE thing is like an archaeological dig into a midden.
 

user9876

Senior Member
Messages
4,556
Reading this makes me realise: Just talking about medical trials generally, if a trial is to be correctly peer reviewed, then should that not mean that all the component parts that contribute should, themselves, have been verified or peer reviewed in some way? It is only as strong as the weakest link. So if a trial is going to measure outcomes using Method X, then surely that must mean Method X itself has to have been fully validated, else that part of the outcome cannot itself pass a peer review ... surely? You cannot just say "Well, we invented Method X because it suited us to use it, and because we are such clever bar-stewards we must be right so don't argue?!".

So the fatigue score should have been peer reviewed ages ago, before it was ever allowed to be used in such a lives-changing clinical trial. Other scientists and mathematicians should have had their chance to identify the flaws in it, so it could be honed into something viable. Same for any other methods or practices employed.

This whole PACE thing is like an archaeological dig into a midden.

I that it was published it was peer reviewed. But that doesn't mean it has the properties that are necessary for the trial or the way it is used in the trial. I don't think the SF36 is a great deal better in that the questions clearly do not lead to equal spaces when judging physical abilities - hence the stats used are wrong.

The real problem is that no-one looks they see a 'scale' mentioned and if its been used before they say that is fine. They don't question the characteristics it has. Unlike when I worked on signal processing we would try to understand the characteristics of the equipment we were using so we could understand how the algorithms we developed may fail. Now I do computer security and I think a large number of vulnerabilities are because developers don't think about how the function calls they used work and hence use them in unsafe ways. But people are trying to do something about that.

I keep thinking that one of the real issues is the lack of a formalism behind medical trials and so no desired properties are stated and hence none are checked.
 

slysaint

Senior Member
Messages
2,125
I think its use is not that wide spread outside a small group in the UK and possibly the netherlands.
Google it.........they use it all over the place now and not just for CFS:
https://www.ncbi.nlm.nih.gov/pubmed/17324680
Cross-cultural validation of the Chalder Fatigue Questionnaire in Brazilian primary care.

https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=0ahUKEwjBu4eR3fTSAhXEIMAKHVCoBEMQFghGMAU&url=http://www.mdpi.com/1660-4601/13/1/147/pdf&usg=AFQjCNHSIqMwX6tDxZ5JDk9KRmJWkS8rHg&bvm=bv.150729734,d.d24&cad=rja

Reliability and Construct Validity of Two Versions of
Chalder Fatigue Scale among the General Population
in Mainland China

eta: just lookg again at these and Simon Wessely was involved in the first one.
 
Last edited:

Barry53

Senior Member
Messages
2,391
Location
UK
I keep thinking that one of the real issues is the lack of a formalism behind medical trials and so no desired properties are stated and hence none are checked.
Exactly. As a design engineer myself I cannot help feeling quite appalled at how ad hoc and downright lackadaisical some of the clinical trial processes come across as being, but PACE and PR have really been my only exposure to it.

I would love to see a Health and Safety perspective on medical research, and particularly PACE ... I think it would show up some interesting observations.