Measures of outcome for trials and other studies

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
[re @Bob] I am leaning towards something like this at the moment.

1. Does the person actually think they are better, all things considered...

2. Is there physiological evidence to support the person's claim that they are better? (To reduce the chance they are just saying they are better to please you, and other stuff.) Tilt table seems to have its pros and cons. Maybe thinking tests should go in here (I hate 'cognitive').

3. Is there a change in real life ongoing activities of daily living that supports the claim of being better? Actometry/ actigraphy seems to have a lot going for it here although it may need to be based on an iWatch7 app not yet invented to be sensitive and flexible enough.

If there is 50% improvement on all three then I think we begin to have an indication that a treatment can be seriously useful.
As a scoring system, I think this is really elegant. My concern is that currently there is nothing capable of doing point 2 re physiological evidence (including thinking tests). CPET, gene expression post-exercise, NK cells, neurocog tests - despite a lot of potential, none have yet shown to reliably track mecfs patients or track disease status/symptom severity.

Is there any symptom or measurable sign (e.g. biochemical abnormality) that we all have in common?
If only, still the Holy Grail of rsearch

Should PEM be excluded on the basis that people try to avoid it, or should some sort of PEM challenge be a compulsory feature?
That's a huge issue, especially as most people agree PEM is the defining symptom of the illness. But how to measure: I think Lenny Jason does great work, but using symptom severity and frequency as the way to score a symptom, when people go out of their way to minimise both. Maybe a challenge of some sort is needed, or conditional questioning eg 'if you do this, would you', or simply 'to what extent to do you avoid' - though I guess both are hard to quantify.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
I seem to be missing an understanding here. When you say "single measure" you mean a single overall score made up of a number of aspects (like the Berg scale ?) and something that is used in all cases for all trials or not that at all ?

Yes, I am suggesting an overall rating derived from several measurements. I would not want to create something compulsory for all trials because every trial has slightly different requirements, but I am thinking in terms of a general rationale for building scoring systems for trials and maybe a core set of suggestions that might be used frequently for standard format proof of concept phase III studies.
 

eafw

Senior Member
Messages
936
Location
UK
I am not sure how severity and length of illness come in to it.

Because at one end you would have someone who works three days a week and goes to the gym once, and travels, and socialises (and would like to work four days and go to the gym twice and travel more and socialise more, and as they are in the early stages actually have some chance of that) and then you have people who have been ill for 20 years and for whom getting a a shower once a week is the week's major event ...

I am wondering - thinking out loud as much as anything - what currently available physiological measure could be usefully used in common across those groups ?

Maybe something would make sense equally applied, but the manifestations and severity are so different, wouldn't we need different scales ?
 

Simon

Senior Member
Messages
3,789
Location
Monmouth, UK
RE:
Jonathan Edwards said:
1. Does the person actually think they are better, all things considered..
I know there's been a lot of discussion of how best to measure this, including just asking people if they consider they have improved. I'm pretty sure there's been researching showing participants have poor memories (even without mefcs) and such scales (eg Clinical Global Impression = overall better/worse) are not very reliable. But more specific scales might be useful, eg
As I said before, Dr. David Bells "ME/CFS Ability Scale" is an example of a place to start... it certainly could be improved upon.
...here is Dr Bells Scale (i'm not setting the scale is it by any means perfect or even usable. I'm just using it as an example someplace to start the conversation):

100: Patient has no symptoms at rest, no symptoms with exercise; has
normal overall activity; is able to work full-time without difficulty...

I think we could design such a questionnaire between us, but perhaps it would be too complicated to test it to see if it was useful.
Yup, especially judging by this thread.

Another ME-specific scale is the AYME Functional ability scale - originally developed from the widely-used Karnofsky Disability scale, and then refined in consultation with mecfs patients. It's not really been properly validated, but I like the way it looks at work/study, and social life, and symptom level (because of the usual trade off that's been discussed)

Some examples:
100% FULLY RECOVERED

No symptoms, even following physical or mental activity. Able to study (or work) full time without difficulty, AND enjoy a social life.


90% MILDLY AFFECTED

No symptoms at rest. Mild symptoms following physical or mental activity ‐ tire easily. Study/work full time with some difficulty. Social life rather restricted with gradual recovery over 2/3 days.



70% MODERATELY AFFECTED

Mild symptoms at rest, worsened to severe by physical or mental activity. Daily activity limited. Part time study at school/college is very tiring, and may be restricting social life. Part time work may be possible for a few hours in the day. With careful pacing of activities and rest periods, you may discover windows of time during the day when you feel significantly better. Gentle walking or swimming can be beneficial.



50% MODERATE TO SEVERELY AFFECTED

Moderate symptoms at rest. Increasing symptoms following physical or mental activity. Midday rest may still be needed. Simple, short (1hr) home study/home activity possible, when alternated with quiet, non‐active social life. Concentration is limited. Not confined to the house, but may be unable to walk much beyond 100/200m without support. May manage a trip to the shops in the wheelchair.



30% SEVERELY AFFECTED

Moderate to severe symptoms at rest. Severe symptoms following any physical or mental activity. Usually confined to the house but may occasionally take a quiet wheelchair ride or very short, gentle walk in the fresh air. Most of the day resting. Very small tasks possible but mental concentration poor and home study difficult....
 

Jonathan Edwards

"Gibberish"
Messages
5,256
Because at one end you would have someone who works three days a week and goes to the gym once, and travels, and socialises (and would like to work four days and go to the gym twice and travel more and socialise more, and as they are in the early stages actually have some chance of that) and then you have people who have been ill for 20 years and for whom getting a a shower once a week is the week's major event ...

I am wondering - thinking out loud as much as anything - what currently available physiological measure could be usefully used in common across those groups ?

Maybe something would make sense equally applied, but the manifestations and severity are so different, wouldn't we need different scales ?

You might, and that could be accommodated. Remember that I am suggesting a percentage change based scoring. That seems to work across all severities in rheumatoid arthritis but clearly ME is more difficult for several reasons. We are struggling to think of a physiological measure, but at least there are some suggestions.
 

eafw

Senior Member
Messages
936
Location
UK
That seems to work across all severities in rheumatoid arthritis but clearly ME is more difficult for several reasons. We are struggling to think of a physiological measure, but at least there are some suggestions.

Is the difference with ME that it is not just one "marker" that gets worse for the severes and then lesser in the milds, rather additional things start going wrong and that is why the disabilty seems to spiral down so badly.

So I'd still be inclined to split things somehow.

Those with POTS is perhaps a good example as we do have ways to measure that that map very well to functionality. Eg, before : resting HR 64, two minutes standing brushing teeth 120 (and feeling very wobbly). After: resting HR 64, two minutes brushing teeth 90 (not good but not nearly as wobbly). Average figures of course, morning and evening measurements over a number of weeks.

This would seem to me to be a useful measure - but only for this group - no use for those who have no significant OI.
 

eafw

Senior Member
Messages
936
Location
UK
Has anyone mentioned body temperature ? Not going through all nine pages again, but it's another easy measure when looking for changes in an individual.
 

A.B.

Senior Member
Messages
3,780
Why couldn't a scoring system such as this one work?

Final score = s1 * s2 * sn

Where s represents some measure of symptoms, or functioning, or biomarker, or whatever makes sense, over time. s would be normalized to a range of 0 to 1, with 0 being worst and 1 being the best. To have a maximum improvement (ie. a final score of 1) one would have to be symptom free, with completely normal functioning, a normal biomarker, 100% of the time. If a patient had a function score of 0.5 which failed to improve, while everything else improved to the maximum, they would still have a poor final score of 0.5. This system should be sensitive to any deterioration or lack of improvement.
 
Last edited:

Jonathan Edwards

"Gibberish"
Messages
5,256
Why couldn't a scoring system such as this one work?

Final score = s1 * s2 * sn

Where s represents some measure of symptoms, or functioning, or biomarker, or whatever makes sense, over time. s would be normalized to a range of 0 to 1, with 0 being worst and 1 being the best. To have a maximum improvement (ie. a final score of 1) one would have to be symptom free, with completely normal functioning, a normal biomarker, 100% of the time.

I think the maths needs to be more subtle. The objective is to measure a change for the better associated with treatment so you need to work from differences. It is unlikely that any of the things measured are linear so you can get into all sorts of bogus conclusions from multiplying them together before calculating a difference. A lot of the scales used are really pseudonumerical in the sense that they allow a reasonably reliable gauge of better or worse and much better and much worse but the numbers used are not actually suitable for any arithmetical procedure. Like Piano exams, going from Grade 3 to Grade 5 is not the same as from Grade 6 to Grade 8. So I think you have to start from assessments of differences before you start compositing.

Differences are normalised to percentages in the ACR grading. This only really works if you score bad as high and healthy as zero. It may be interesting that some of the ME scoring systems go the other way with zero as terrible and 100 as healthy. It may be that this is inherently unsuitable for getting a decent composite measure. I would be interested in further thoughts from the mathematicians here.

The logic of bad being high and using percentages is presumably that we all intuitively understand the concept of 'being 50% better' or 'being 100% better'. It is easy to understand zero as 'nothing wrong' but not so easy to understand it in terms of 'as ill as possible' maybe. In RA a lot of measures are bad=high for obvious reasons - number of swollen joints or ESR. So maybe that is where it started but my suspicion is that it makes sense in terms of the broader objective.

I think one could very well argue that percentages are themselves bogus if we are dealing with non-linear pseudonumerical scales. Maybe differences should be scored in terms of points along a scale - such as 4 points better - which could be from 2 to 6 on a high=bad scale and from 7 to 3 on a low=bad scale. But again this does not seem to translate easily into the intuitive idea of percentage improvement - which for some reason is something that all patients and doctors seem to understand without it being explained.

The last point is that if we are using different measures to check the reliability of each other then the threshold system of ACR looks to me more valid than multiplying. If a mango is red but woody hard then it is still not ripe - because it is woody hard. If these are pseudonumerical measures then I think this makes more sense.

If we are thinking of reliability in terms of probability then maybe we can multiply and divide - just as in Bayes's theorem. But often these multiplications are quite complicated. You may be multiplying the probability of things being NOT what you want to confirm. If I see somone walking just like John I might think the probability that it is not John is 0.2. If I see he has the colour hat John wears the probability is 0.02 (because only 10% of hats are this colour). So I am 98% sure it is John. But you cannot multiply the probabilities that it IS John (0.8 x 0.9 = 0.72). And again, at the end of the day we want an idea of how much better to fall out, rather than just our confidence of being better.

Maybe it is all a fudge however you do it, but for all its faults the ACR system does seem to work rather well and I am not impressed that there is anything as good for ME.
 

MeSci

ME/CFS since 1995; activity level 6?
Messages
8,235
Location
Cornwall, UK
Is the difference with ME that it is not just one "marker" that gets worse for the severes and then lesser in the milds, rather additional things start going wrong and that is why the disabilty seems to spiral down so badly.

So I'd still be inclined to split things somehow.

Those with POTS is perhaps a good example as we do have ways to measure that that map very well to functionality. Eg, before : resting HR 64, two minutes standing brushing teeth 120 (and feeling very wobbly). After: resting HR 64, two minutes brushing teeth 90 (not good but not nearly as wobbly). Average figures of course, morning and evening measurements over a number of weeks.

This would seem to me to be a useful measure - but only for this group - no use for those who have no significant OI.

It can measure simple physical activity, e.g. sitting down brushing teeth (which I often do!) - no positional change.

EDIT - by 'It' I meant heart rate, but other measures may be appropriate.
 
Last edited:

user9876

Senior Member
Messages
4,556
Has anyone mentioned body temperature ? Not going through all nine pages again, but it's another easy measure when looking for changes in an individual.
There's a poll on body temperature here.
The poll is on average body temperature. What I've noticed with my daughter is that her temperature is quite variable even within an hour and tends to be be higher when she feels particularly rough. Generally it would be at about 36.6 but rises to upto around 37.8 when she has a particularly bad moment but an hour or so later (on the odd occasion I've tested it) its back down to normal.

So I wonder if temperature variation is perhaps something worth measuring. But the trouble with things like this is they would need validating with some sort of study. What I could see is it may be worth just doing some continuous monitoring of patients before using measures for a trial just to get a feel for variation.
 

Jonathan Edwards

"Gibberish"
Messages
5,256
Effort should also go towards distinguishing the qualitative differences of symptoms compared with healthy controls, not merely the severity, e.g. : Examining Types of Fatigue Among Individuals with ME/CFS (Jason et al)

http://dsq-sds.org/article/view/938/1113

That would certainly be of interest in terms of secondary analusis of which features responded best to a treatment but I am not sure qualitative variation can be factored into a primary outcome measure of efficacy of a treatment.
 

lansbergen

Senior Member
Messages
2,512
The poll is on average body temperature. What I've noticed with my daughter is that her temperature is quite variable even within an hour and tends to be be higher when she feels particularly rough. Generally it would be at about 36.6 but rises to upto around 37.8 when she has a particularly bad moment but an hour or so later (on the odd occasion I've tested it) its back down to normal.

Fast fluctating temperature is not unusely with this disease. Does she ever get undertemperature during this episodes?
 

user9876

Senior Member
Messages
4,556
Fast fluctating temperature is not unusely with this disease. Does she ever get undertemperature during this episodes?
No idea - There are occasions she says that she feels hot and to take her temperature so I wouldn't notice low temperature.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
That would certainly be of interest in terms of secondary analusis of which features responded best to a treatment but I am not sure qualitative variation can be factored into a primary outcome measure of efficacy of a treatment.

I really need to go back to the start of this thread and remind myself of the original intention but any useable outcome measure is going to be reductionist. I had suggested meaures based on the ability to carry out various tasks but this would lose some information gathered by symptom based measures. Maybe on balance that isn't such a great loss (in terms of how successful a treatment is judged) but it would also be possible to gather some 'rich' subjective data either at the end of the trial or at several points.

Something along the lines of 'in your own words - how has your health/symptoms changed since 'x'. There would be an added burden of carrying out a thematic analysis of any free/non-constrained textual responses but unless we're in the rare and lucky position of having a very large sample it shouldn't be unsurmountable.
 

Bob

Senior Member
Messages
16,455
Location
England (south coast)
@Jonathan Edwards, I know you're not keen on the SF-36 scales, but I think SF-36 physical function has some benefits:

1. It's widely used, so you can compare scores with the general population and patients with other illnesses. That's been very useful when analysing e.g. the pace trial outcomes.
2. It doesn't have a ceiling effect in terms of poor health, or severely poor health. So deterioration and improvement in very ill patients can be monitored. (The chalder fatigue scale is a horrendously unhelpful scale because a significant proportion of patients score close to the maximum score, so deterioration can't be measured in all patients.)
3. It's a less relative/subjective questionnaire: It asks questions about engagement in specific activities rather than asking you to rate subjective symptoms in relation to a previous (long-forgotten) health status.

I think perhaps the the biggest negative is that, to keep things simple, the questions aren't very nuanced, and so it's not a very subtle measure of change.

I'm not familiar with many other self-report scales, so I can't compare it with many others, but I've always thought that SF-36 physical function is OK-ish for a broad indicator of changes in function.

I don't know if others would agree.
 
Last edited:
Back