• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
OK, now I need a statistician to explain Cohen's rules of thumbs, and to re-interpret the PACE Trial results, in language that I can easily understand

Always here for a challenge! If you file a coin so that it is lopsided and comes down heads with a probability of 0.501 (i.e. 501 times in a thousand, rather than 500), you would never detect it by tossing a coin 10 times, 100 times or even 1000 times. But if you tossed the coin millions of times, you would detect the slight bias in heads. The reason for that is simple. For ten tosses, 6H and 4T (60% heads) is only one out from 5H and 5H, so the proportion of heads to tails can change dramatically with just a few variations. If you tossed a coin a thousand times, you would need an extra hundred heads (600 to 400) to get 60% heads, which isn't so likely. So 501 heads out of a thousand isn't too unusual - just a single blip, but scaled up, 501000 heads out of a million needs an extra 1000 heads - more than just a blip. If you have large enough samples you can detect much smaller changes. These would be statistically significant (i.e. a statistician would say, that's pretty unusual, something must be up!), but pretty trivial in the real world.

Now men's heights are - mean 70 inches, s.d. 3 inches: women's are 64, s.d. 3 (roughly). Cohen says that if you calculate 70-64=6, then divide by the average standard deviation (3), 6/3=2, that final figure tells you in real world terms how noticeable the difference is. In other words, the average man's height is two standard deviations from the average woman.

Doing the same calculation to compare GET with SMC on the Chalder scale, we have outcomes of 20.6 (7.5) and 23.8 (6.6). So we calculate 23.820.6=3.2. Finding the "average" standard deviation isn't a question of adding and dividing by two though, because standard deviations are really multiplying or scaling factors. To average them you need to calculate the sq root of [(7.5 squared + 6.6 squared)/2] = 7.06 . Then 3.2/7.06 = 0.45, which Cohen would call a medium effect size - in other words the difference between the SMC group and the GET group is about half the standard deviation. (I think I have done the sums right, but I am a maths teacher, so I am used to people spotting my mistakes for me.) In effect this is what the PACE trial has done, by considering a difference of 1 s.d. to be significant.

So by now you will be feeling pretty disappointed. BUT (and it is in capitals) I don't think that is relevant. If every day you had 18 to 20 nuisance phone calls (say from a friend who wanted to chat about PACE results), but by having a word with the friend's partner, it was cut down to 15 or 16, that would count as a large effect change. It is the fact that you are not taking into account what is normal (i.e. no nuisance calls). If there were only people with ME in the world we would be doomed (perhaps that is what happened to the dinosaurs!), but the kind of change that the PACE trial recorded would be noticed. But all the time we are surrounded by people who are well, so the changes are tiny when compared with the population at large.

Statistics cannot really give us a value for that because it is a judgement about quality of life. The question is, how much of a move towards real health would you consider as significant?

Sorry for such a long posting.
 

wdb

Senior Member
Messages
1,392
Location
London
Arguably a more accurate coin analogy would be to take two groups A & B, get each participant in each group to toss a coin 1000 times without keeping count and then write down how many times they believed it had landed heads. Then give just group B 15 long sessions of a procedure which they are told that evidence from previous trials suggests would make them very likely to toss heads much more often than tails, and non-directly suggest they will be a disappointing failure if they do not achieve this to some extent. Then repeat the test each of the participants tossing the coin 1000 times as before and compare the reported scores from groups A & B.
 

oceanblue

Guest
Messages
1,383
Location
UK
Doing the same calculation to compare GET with SMC on the Chalder scale, we have outcomes of 20.6 (7.5) and 23.8 (6.6). So we calculate 23.820.6=3.2. Finding the "average" standard deviation isn't a question of adding and dividing by two though, because standard deviations are really multiplying or scaling factors. To average them you need to calculate the sq root of [(7.5 squared + 6.6 squared)/2] = 7.06 . Then 3.2/7.06 = 0.45, which Cohen would call a medium effect size - in other words the difference between the SMC group and the GET group is about half the standard deviation. (I think I have done the sums right, but I am a maths teacher, so I am used to people spotting my mistakes for me.) In effect this is what the PACE trial has done, by considering a difference of 1 s.d. to be significant.

... But all the time we are surrounded by people who are well, so the changes are tiny when compared with the population at large.

Statistics cannot really give us a value for that because it is a judgement about quality of life. The question is, how much of a move towards real health would you consider as significant?
Very interesting explanation, thanks.

[very geeky post alert] Couple of points. I thought Cohen's 0.3-0.5 was usually interpreted as small, so the PACE GET effect would be small, or maybe more accurately 'small-to-moderate'. Also, there seems to be a lot of debate about which SDs to use. You've pooled the SDs of the post-treatment groups (which inevitably have larger SDs than the pre-treatment groups). Using the pre-treament SDs would lead to a LARGER effect size (boo hiss).

I did find a paper by Olejnik that discussed how to handle SDs but didn't understand a word of it. If you can enlighten me as to what it means and which SDs are 'correct' I'd be grateful, but please don't bother with this if it will be a lot of work. Thanks
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
Arguably a more accurate coin analogy would be to take two groups A & B, get each participant in each group to toss a coin 1000 times without keeping count and then write down how many times they believed it had landed heads. Then give just group B 15 long sessions of a procedure which they are told that evidence from previous trials suggests would make them very likely to toss heads much more often than tails, and non-directly suggest they will be a disappointing failure if they do not achieve this to some extent. Then repeat the test each of the participants tossing the coin 1000 times as before and compare the reported scores from groups A & B.

So are you suggesting that the marginal results of the PACE Trial are purely down to response bias, and the clever and subtle manipulation of the study by the authors? How very cynical of you! ;)
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
Always here for a challenge! If you file a coin so that it is lopsided and comes down heads with a probability of 0.501 (i.e. 501 times in a thousand, rather than 500), you would never detect it by tossing a coin 10 times, 100 times or even 1000 times. But if you tossed the coin millions of times, you would detect the slight bias in heads. The reason for that is simple. For ten tosses, 6H and 4T (60% heads) is only one out from 5H and 5H, so the proportion of heads to tails can change dramatically with just a few variations. If you tossed a coin a thousand times, you would need an extra hundred heads (600 to 400) to get 60% heads, which isn't so likely. So 501 heads out of a thousand isn't too unusual - just a single blip, but scaled up, 501000 heads out of a million needs an extra 1000 heads - more than just a blip. If you have large enough samples you can detect much smaller changes. These would be statistically significant (i.e. a statistician would say, that's pretty unusual, something must be up!), but pretty trivial in the real world.

Now men's heights are - mean 70 inches, s.d. 3 inches: women's are 64, s.d. 3 (roughly). Cohen says that if you calculate 70-64=6, then divide by the average standard deviation (3), 6/3=2, that final figure tells you in real world terms how noticeable the difference is. In other words, the average man's height is two standard deviations from the average woman.

Doing the same calculation to compare GET with SMC on the Chalder scale, we have outcomes of 20.6 (7.5) and 23.8 (6.6). So we calculate 23.820.6=3.2. Finding the "average" standard deviation isn't a question of adding and dividing by two though, because standard deviations are really multiplying or scaling factors. To average them you need to calculate the sq root of [(7.5 squared + 6.6 squared)/2] = 7.06 . Then 3.2/7.06 = 0.45, which Cohen would call a medium effect size - in other words the difference between the SMC group and the GET group is about half the standard deviation. (I think I have done the sums right, but I am a maths teacher, so I am used to people spotting my mistakes for me.) In effect this is what the PACE trial has done, by considering a difference of 1 s.d. to be significant.

So by now you will be feeling pretty disappointed. BUT (and it is in capitals) I don't think that is relevant. If every day you had 18 to 20 nuisance phone calls (say from a friend who wanted to chat about PACE results), but by having a word with the friend's partner, it was cut down to 15 or 16, that would count as a large effect change. It is the fact that you are not taking into account what is normal (i.e. no nuisance calls). If there were only people with ME in the world we would be doomed (perhaps that is what happened to the dinosaurs!), but the kind of change that the PACE trial recorded would be noticed. But all the time we are surrounded by people who are well, so the changes are tiny when compared with the population at large.

Statistics cannot really give us a value for that because it is a judgement about quality of life. The question is, how much of a move towards real health would you consider as significant?

Sorry for such a long posting.

Well that's perfectly clear then! :eek:

I'm going to have to work this one out slowly.


If every day you had 18 to 20 nuisance phone calls (say from a friend who wanted to chat about PACE results)...

I'm sorry about all of those phone call Graham. :(
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
Very interesting explanation, thanks.

[very geeky post alert] Couple of points. I thought Cohen's 0.3-0.5 was usually interpreted as small, so the PACE GET effect would be small, or maybe more accurately 'small-to-moderate'. Also, there seems to be a lot of debate about which SDs to use. You've pooled the SDs of the post-treatment groups (which inevitably have larger SDs than the pre-treatment groups). Using the pre-treament SDs would lead to a LARGER effect size (boo hiss).

I did find a paper by Olejnik that discussed how to handle SDs but didn't understand a word of it. If you can enlighten me as to what it means and which SDs are 'correct' I'd be grateful, but please don't bother with this if it will be a lot of work. Thanks

Here's some info from online...

Cohen suggested that d=0.2 be considered a 'small' effect size, 0.5 represents a 'medium' effect size and 0.8 a 'large' effect size.
http://staff.bath.ac.uk/pssiw/stats2/page2/page14/page14.html

Cohen (1988) created the following categories to interpret d:
? Small = .2
? Medium = .5
? Large = .8

http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/Effect Size outline.pdf
 

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
You know me and analogies: well I think I have found one that illustrates why I am not very bothered whether Cohen's formula shows that the effect is big or small. All the following numbers given are properly scaled versions of the GET/SMC Chalder scores in terms of exam percentages (where 33 on the Fatigue Scale is 0%, and 0 is 100%).

Imagine that you desperately want to be a doctor, in fact a psychiatrist, but you are told that you need a grade A at A-level Statistics to be accepted on the course. You took the exam last summer and got 15%. You are told that, on average, people who join the School's Maths Class for the year, and work hard, increase that mark to 29%. Those who also have a dozen or so of Graham's Expensive Tuition sessions in addition to the usual SMC, on average, increase their mark to 38% (which could be a grade E or D).

Paying for all those extra maths lessons on average adds a further 9% to the students' marks. Cohen would tell you that these improvements are large in relation to how bad you were at statistics, but are they worthwhile?

What if you are told that 13 out of every 15 students that have the extra lessons only increase by 1% to 30%, but that 2 students increase to 91%? That Graham has lots of data about the students but hasn't put the time in to determine which characteristics ensure success? That now everyone who wants to retake Statistics has to pay for all the extra lessons on the grounds that some of them will make it? And, just incidently, Graham will make a lot of money out of it.

My attitude is that statistics should simplify and illuminate, but we should not let calculations take the place of value judgements.
 

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
Very interesting explanation, thanks.

[very geeky post alert] Couple of points. I thought Cohen's 0.3-0.5 was usually interpreted as small, so the PACE GET effect would be small, or maybe more accurately 'small-to-moderate'. Also, there seems to be a lot of debate about which SDs to use. You've pooled the SDs of the post-treatment groups (which inevitably have larger SDs than the pre-treatment groups). Using the pre-treament SDs would lead to a LARGER effect size (boo hiss).

I did find a paper by Olejnik that discussed how to handle SDs but didn't understand a word of it. If you can enlighten me as to what it means and which SDs are 'correct' I'd be grateful, but please don't bother with this if it will be a lot of work. Thanks

It is a bit of a mess, isn't it, with the standard deviations? I was being quite sloppy about which ones to use. It all seems to depend on what assumptions you make about the underlying pattern of results. I started to read the document you gave me a link to, and thought I was going OK, but then my brain turned into a pile of mush and I lost the will to live. Statistics does that to me. I may have another go at it - I hate giving up - but I'm not convinced that it is any more relevant than calculating how many angels can fit on the head of a pin.

Pretty much all the A-level maths teachers that I know hate teaching statistics, because you keep having to say "it all depends".
 

anciendaze

Senior Member
Messages
1,841
O.K. Graham, I'll take your word on a good bit of that, but here's what keeps bugging me. A normal (Gaussian) distribution is completely described by two parameters, mean and variance (or SD). The underlying distribution for the population being sampled requires four parameters, according to several different attempts I have made to fit the data. (e.g. mean, SD, kurtosis, skewness)

If the distribution requires four parameters, and you only calculate two, what significance do any inferences based on those parameters have?
 

Mark

Senior Member
Messages
5,238
Location
Sofa, UK
You know me and analogies: well I think I have found one that illustrates why I am not very bothered whether Cohen's formula shows that the effect is big or small. All the following numbers given are properly scaled versions of the GET/SMC Chalder scores in terms of exam percentages (where 33 on the Fatigue Scale is 0%, and 0 is 100%).

Imagine that you desperately want to be a doctor, in fact a psychiatrist, but you are told that you need a grade A at A-level Statistics to be accepted on the course. You took the exam last summer and got 15%. You are told that, on average, people who join the School's Maths Class for the year, and work hard, increase that mark to 29%. Those who also have a dozen or so of Graham's Expensive Tuition sessions in addition to the usual SMC, on average, increase their mark to 38% (which could be a grade E or D).

Paying for all those extra maths lessons on average adds a further 9% to the students' marks. Cohen would tell you that these improvements are large in relation to how bad you were at statistics, but are they worthwhile?

What if you are told that 13 out of every 15 students that have the extra lessons only increase by 1% to 30%, but that 2 students increase to 91%? That Graham has lots of data about the students but hasn't put the time in to determine which characteristics ensure success? That now everyone who wants to retake Statistics has to pay for all the extra lessons on the grounds that some of them will make it? And, just incidently, Graham will make a lot of money out of it.

My attitude is that statistics should simplify and illuminate, but we should not let calculations take the place of value judgements.

:victory:

Genius! Perfectly fair and easily understandable analogy, and the GET gag made me laugh out loud to boot. I'll have some of Graham's Expensive Tuition please! :D
 

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
O.K. Graham, I'll take your word on a good bit of that, but here's what keeps bugging me. A normal (Gaussian) distribution is completely described by two parameters, mean and variance (or SD). The underlying distribution for the population being sampled requires four parameters, according to several different attempts I have made to fit the data. (e.g. mean, SD, kurtosis, skewness)

If the distribution requires four parameters, and you only calculate two, what significance do any inferences based on those parameters have?

Well, I'll have a shot at this one from a general mathematical point of view: remember that I'm no statistics whizz-kid (well, I can't even claim the kid bit). The number of bits of data you need to specify any distribution varies according to the pattern. A Normal/Gaussian curve follows a strong pattern and only needs two measurements to define it (2 parameters) - the mean and the variance. So the curve has two degrees of freedom - where it is centred (the mean) and how fat it is (the variance). If you know those, you can draw the curve. A Poisson distribution is even more strongly patterned and only needs one parameter - both the mean and the variance are equal - it only has one degree of freedom. If you increase the mean, you increase the variance.

But if you take 50 different measurements of something, these are independent values, so have 50 degrees of freedom, or 50 parameters. If you calculate the mean, that isn't a new bit of information: it sort of uses up, or pins down, one of those degrees of freedom. If you then calculate the variance (or standard deviation), you are then doing a calculation based on the remaining 49 degrees of freedom (which is why you divide by 49 rather than 50 in the calculation). That still leave 48 degrees of freedom. So whatever distribution you think it fits, you are dumping all those 48 degrees of freedom according to your theory of how they are inter-related. That is the nature of using statistics to simplify things.

I have been playing around with 160 items of data and trying to get them to fit the baseline and final outcomes for the SMC group on the Chalder Fatigue scale (yes, "sad" is the right word!). I set up 160 cells for each patient's baseline score, then entered values to get a mean of 28.3 - the easiest way of doing that is to set up a calculation that works out the 160th data item so that it balances. Now I only need to invent 159 data items, and the last one will be filled in to give the correct mean. Then, if I do the same for the standard deviation (which is too tricky to bother), it would determine the 159th data item. That still leaves me 158 independent data items to make up. So what sort of pattern should I assume it fits? Actually it is very hard to produce a mean of 28.3 and a standard deviation of 3.8 on the standard distribution patterns, and even harder to produce the final results of a mean of 23.8 and a standard deviation of 6.6. The "brick wall" of 33 wrecks most of them. That convinces me of one thing; there is no simple distribution pattern to this data. Any attempt to "measure" the results with just a few parameters is going to lead to difficulties or confusion, because the data just does not simplify to a pattern very easily (another reason why I believe we are dealing with multiple causes of CFS/ME).

I'm afraid that's a rather philosophical argument, rather than a statistical one, but, as a mathematician, it makes sense to me (and is why statistics wasn't considered part of mathematics when I was at university). As I said, I'm no statistician!

Hope that helps you understand where I am coming from. I find it hard to put it into the right words.
 

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
Very interesting explanation, thanks.

[very geeky post alert] I did find a paper by Olejnik that discussed how to handle SDs but didn't understand a word of it. If you can enlighten me as to what it means and which SDs are 'correct' I'd be grateful, but please don't bother with this if it will be a lot of work. Thanks

I'm on a roll today. Apologies all round for hogging the site!

Cohen simply looks at how big the improvement is in comparison to the amount of variation you get (the improvement is the difference in means, and the amount of variation is the standard deviation). The problem is, which measure of variation do you use? If you use the baseline standard deviation, you are saying "If everyone was like this baseline group, then how obvious would a change like this be?" So, using the SMC group and Chalder, as changes from the baseline "average out" at a standard deviation of 3.6, an improvement from 28.3 to 23.8 - a change of 4.5 would be noticed. Or you could look at the way the standard deviation varies at each stage - 3.6, 6.5, 6.9, and 6.6, and devise some way of combining these to get a measurement of the amount of variation in the group that includes the ones who improve. Or you could decide to work backwards and use the variation at the end (6.6) to ask how noticeable is the journey that they have just come on? Personally, if forced to use this measure, I wouldn't use any of those. I would find the standard deviation of the whole population - from healthy to ill - and use that: i.e. from the perspective of anyone, how much would they notice such a change. I don't know what that standard deviation is, but it will be a lot more than 6.6 if it is to include everyone.
 

Dolphin

Senior Member
Messages
17,567
(Junk) Walking speeds

My very overweight sister (BMI around 35) in her mid 30s (who has asthma) did a 10,000km walk yesterday (in aid of ME/CFS). There was also a bit of walk in advance - probably 2000 metres anyway. And there was a lot of people (40,000) so couldn't go at normal speed at start anyway. She did it in 110 minutes or the equivalent of 545m for the 6 minute walk (but presumably she could have gone a lot faster if all she had to do was walk 6 minutes rather than 110 minutes). For me, it helps to put the walking speeds achieved in the PACE Trial (379m was highest average) into perspective.
 

oceanblue

Guest
Messages
1,383
Location
UK
Not sure your sis would be too flattered by your description after she's just walked 10km to raise funds for this illness but I take your point:D. Congratulations and thanks to her for the feat.
 

oceanblue

Guest
Messages
1,383
Location
UK
If you use the baseline standard deviation, you are saying "If everyone was like this baseline group, then how obvious would a change like this be?"
Nice description!

Somewhere in the bowels of the thread is a post (from Marco, I think) referring to research where they asked patients how much improvement was significant, which seems like a good way to go. I think they asked patients who had improved after a trial so it was based on their actual improvement rather than an abstract question, then converted the patients' answers back to SDs (this is be called anchoring, iirc).

Normallly, I would say the baseline was a better reference for SD as the treatment group SD depends on how effective the treatment is and how uniformly it helps. Unfortunately, as PACE used the primary outcome measures to select patients in the first place, the baseline SD is artificially constrained. That's why I was interested in finding an appropriate pooled SD.

The SD for the Chalder scale is around 4 for a general population (it has a floor effect, where most people will score 11 (=normal) ad very few score less therby constraining the SD).
 

anciendaze

Senior Member
Messages
1,841
O.K. Graham, I see that we are communicating at some level. I had avoided talking about multiple moments of the distribution to save those with less background from being left out.

Try my second attempt at modeling the population distribution: one normal distribution with a mean of 95 or 100 and an SD of less than 15, plus one normal distribution with a much lower mean and much larger SD. You can label these two subpopulations "healthy" and "sick". What does this do to the assumption behind the PACE trial that CFS sufferers are merely deconditioned healthy people with "false illness beliefs"?
 

Graham

Senior Moment
Messages
5,188
Location
Sussex, UK
Hi anciendaze - sorry if I didn't get the level right! I'm still getting to grips with the different specialities here. As I said, I am no statistician: primarily I am a mathematician, and the two are not the same - to my mind it is the difference between physics and biology. I think this discussion could get too technical for posting, so I'll contact you by a private message, but I don't think my statistical skills are up to satisfying your challenge. I'm more of a hammer and chisel mathematician than a brain surgeon.
 

Dolphin

Senior Member
Messages
17,567
I'm on a roll today. Apologies all round for hogging the site!

Cohen simply looks at how big the improvement is in comparison to the amount of variation you get (the improvement is the difference in means, and the amount of variation is the standard deviation). The problem is, which measure of variation do you use? If you use the baseline standard deviation, you are saying "If everyone was like this baseline group, then how obvious would a change like this be?" So, using the SMC group and Chalder, as changes from the baseline "average out" at a standard deviation of 3.6, an improvement from 28.3 to 23.8 - a change of 4.5 would be noticed. Or you could look at the way the standard deviation varies at each stage - 3.6, 6.5, 6.9, and 6.6, and devise some way of combining these to get a measurement of the amount of variation in the group that includes the ones who improve. Or you could decide to work backwards and use the variation at the end (6.6) to ask how noticeable is the journey that they have just come on? Personally, if forced to use this measure, I wouldn't use any of those. I would find the standard deviation of the whole population - from healthy to ill - and use that: i.e. from the perspective of anyone, how much would they notice such a change. I don't know what that standard deviation is, but it will be a lot more than 6.6 if it is to include everyone.
This letter published in the Lancet http://forums.phoenixrising.me/show...and-editorial)&p=179768&viewfull=1#post179768 makes a similar point about the SDs when measuring the effect size.
 

Dolphin

Senior Member
Messages
17,567
Somewhere in the bowels of the thread is a post (from Marco, I think) referring to research where they asked patients how much improvement was significant, which seems like a good way to go. I think they asked patients who had improved after a trial so it was based on their actual improvement rather than an abstract question, then converted the patients' answers back to SDs (this is be called anchoring, iirc).
Yes, I've read a bit about anchors in these discussions - but can't remember it now.

This paper talks a bit about the issue generally - although not so much about choosing numerical values for one specific scale:

Quality and acceptability of patient-reported outcome measures used in chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME): a systematic review.

Qual Life Res. 2011 May 18. [Epub ahead of print]

Haywood KL, Staniszewska S, Chapman S.

at
http://forums.phoenixrising.me/show...ient-reported-outcome-measures-used-in-CFS-ME

Last two paragraphs of discussion section

Dominance of the assessment of emotional well-being
and the symptomatic impact of CFS/ME illustrated in this
review, and the lack of CFS/ME-specific measure that
captures the broad-ranging experience of the condition
highlights the discrepancies that exist between outcomes
assessed in research and those identified by patients as
significant to their experience of living with CFS/ME,
including social well-being and physical function [6]. The
poor quality of reviewed PROMs combined with the failure
to measure genuinely important patient outcomes suggests
that high quality and relevant information about treatment
effect is lacking [123]. Such dilemmas have been reported
in other conditions, for example, Diabetes [124, 125].
Exemplified in rheumatology [126], resources are required
to drive consensus between patients with CFS/ME and
health professionals towards the standardization of
assessment practice, identification of important and
appropriate health outcomes and selection of good quality,
acceptable measures. The resulting availability of relevant
and credible information about treatment effect will support
informed decision-making and patient choice [127].

A more scientific focus towards the collaborative
development and ongoing evaluation of PROMs in CFS/
ME is essential to the future of evidence-based health care.
Without investment in appropriate and rigorous measurement
of health, the true burden of CFS/ME and the relative
success of health care will not be fully understood and
communicated. This investment should include the development
of a patient-derived assessment that includes
detailed descriptions of CFS/ME-specific health and quality
of life across the range of domains considered important
by patients with CFS/ME. Such a measure should be
developed collaboratively with the full involvement of
people with CFS/ME as partners in the process. The lack of
such a measure is an important omission from the battery
of assessment approaches in CFS/ME and must be
addressed as a matter of urgency.