Measures of outcome for trials and other studies

user9876 · Mar 27, 2015

Jonathan Edwards said:
I am a bit sceptical about medians. I agree that a mean sounds dodgy but in reality what Bob means by average may be more like 'total for a month'. You might measure total activity by actimeter for 30 days to even things out - and that would in common sense terms indicate 'what I can actually do over a month'. That seems to me to make more sense than the median count for any one day. And you might say that what matters most is not crashing, or having very bad days. Those very bad days might be the outliers with really low actimeter scores. You might then want a 50% improvement to be at least a 50% increase in your WORST day's actimeter score.

My suspicion is that the maths should follow the common sense interpretation of each particular type of score rather than statistical rules for populations of data points. I think maybe it relates back to the very valid point you made before about these not being linear measures much of the time - at least in terms of their human significance.

The problem is where you have a small number of samples along where some could be outliers. Using the mean and SD tends to let significant outliers (say a trip to the clinic on the last day) dominate the other values. Where sample sizes are bigger and the distributions right they tend to converge. The median is effectively the activity on a typical day rather then the overall activity. In the past I have written stochastic models where we have got outputs with the same mean and SD but very different data distributions and it was the shape of the distribution that changed decisions.

I like the idea of viewing the data as a time series and say measuring peaks and toughs (worst) particularly the worst. I guess my real concern is to explore and understand outlying data and not let it skew results.

user9876 · Mar 27, 2015

Jonathan Edwards said:
I know that people often say this but my own experience in doing trials is that it is not that important. If two trials using different methods seem to give different results for the same basic question then you need another study that is designed to sort out why there is a difference. Standardising all too often means that you do not take advantage of the best method for a particular situation. For me validation is mostly about getting information about reproducibility or consistency of a method - not about whether it is the right way to answer a question - that is a matter to be determined by careful logical thought on each occasion.

I would have thought that it is better to have trials of the same thing that use different outcomes as if they all agree it suggests a degree of robustness. I think this comes to a point of replication - it is fine to replicate an experiment but what we really need is a hypothesis that is replicated by a number of different experiments. This helps remove any biases towards the experimental design rather than chance.

user9876 · Mar 27, 2015

WillowJ said:
Would having a smart phone be a requirement for the trial or would one be issued to people who have a simple phone or no cell phone at all?

I was thinking it may need to be part of any measurement kit including accelerometers, heart/oxygen monitors etc. You could use a website as a reporting mechanism be I thought that may be harder to use and hence get regular inputs.

MeSci · Mar 27, 2015

alex3619 said:
Adding to my previous comment, Dr Martinovic created a detailed symptom checklist which in retrospect seems remarkably like a checklist of autonomic, metabolic, vascular and neurologic issues in clusters. Patients rated their response both in absolute terms and in relative terms, or in other words about current status and changes in status.

Dr Martinovic's work sounds very interesting, and it would be good to revisit it and build on it if necessary/appropriate.

I have been persuaded by the arguments that for subjective measures, apart from things that can be reported in absolute terms, e.g. "Can you do this three times in a row in the space of a minute?", and apart from patients who have not been ill for long so can still remember what 'normal' feels like, relative improvements will be easier to report than absolute ones.

A couple of main things have brought me to this (provisional) opinion.

1. I have recently had something that is definitely different from 'normal' PEM and seems like a cold or flu, but it is so long since I have had a cold or flu it is hard to be sure. I've had to look up the symptoms, and try to remember what a cold or flu feel like, and focus really hard on my symptoms, to try to figure it out. It would be quite easy for a person who does not get PEM to recognise such symptoms as a cold or flu, I think. So the way we perceive things is often unusual.

2. There was discussion about doing various tasks 'without difficulty'. That really brought it home to me that I have difficulty doing a lot of everyday tasks - maybe most. I had just got so used to my 'new normal' that I had become almost unaware of the difficulty. So when asked to assess my level of absolute functioning I am perhaps likely to overestimate it, and maybe most of us do.

Snow Leopard · Mar 27, 2015

Jonathan Edwards said:
So one measure is improvement of fatigue on visual analogue scale. Some patients do not think of themselves as having fatigue so they would opt out of that one and be scored on something else.

The main problem with visual analogue scales for things like fatigue is that we have no frames of reference, because the question is quite abstract.
You would expect a large range in severity to score the same. This also leads to a problem in terms of measuring improvement is that over time, the person participating may become more self aware or have a deeper understanding of the question and therefore answer differently, independently of underlying changes of health.

Abstract notions of fatigue is also why many people do not understand ME/CFS and say stuff like "I get tired after activity too", with no conception of the large difference in severity.

I prefer scoring in discrete steps too (even if the participant is shown a continuous scale) - requiring a clear, substantial and unambiguous step up or down to show improvement or decline.

In terms of the questions of PEM, there clearly needs to be some actual understanding of human behaviour and thoughts before asking questions. So not simply questions about 'do you suffer PEM', but do you carefully manage your activity level to avoid symptoms'.
Questions related to sleep could bee made more specific too, but also placed into context by other questions, even more so than are you able to maintain a regular sleeping rhythm, are there external (specifically mentioned) problems that prevent this (if so, then the weighting of the sleeping rhythm question might be less important, if the external problems continue). Yes this makes the whole questionnaire messy in a scoring sense, but it also means the results will make more sense on a human level.

I think asking questions that are less vague and easier to see what answer is most appropriate because the question has been placed into context, is also easier to answer and is less likely to lead to participants becoming frustrated and the 'questionnaire fatigue' that I mentioned previously.

Snow Leopard · Mar 27, 2015

MeSci said:
I have been persuaded by the arguments that for subjective measures, apart from things that can be reported in absolute terms, e.g. "Can you do this three times in a row in the space of a minute?", and apart from patients who have not been ill for long so can still remember what 'normal' feels like, relative improvements will be easier to report than absolute ones.

A couple of main things have brought me to this (provisional) opinion.

1. I have recently had something that is definitely different from 'normal' PEM and seems like a cold or flu, but it is so long since I have had a cold or flu it is hard to be sure. I've had to look up the symptoms, and try to remember what a cold or flu feel like, and focus really hard on my symptoms, to try to figure it out. It would be quite easy for a person who does not get PEM to recognise such symptoms as a cold or flu, I think. So the way we perceive things is often unusual.

2. There was discussion about doing various tasks 'without difficulty'. That really brought it home to me that I have difficulty doing a lot of everyday tasks - maybe most. I had just got so used to my 'new normal' that I had become almost unaware of the difficulty. So when asked to assess my level of absolute functioning I am perhaps likely to overestimate it, and maybe most of us do.

Some great points!
Questions that are relative eg are things better or worse than in the past should be avoided altogether. When you have been ill for a long time, you forget what is normal!

Likewise, 'with difficulty' is too vague. Such questions should try to describe what such levels of difficulty actually mean in terms of impact on life activities.

Bob · Mar 27, 2015

Jonathan Edwards said:
And it seems that the deconditioning enthusiasts have mostly decided there isn't deconditioning anyway!

Glad you noticed that. It was rather a quiet u-turn for such a drastic change in their model of illness! Just fear-avoidance to go now!

Snow Leopard · Mar 27, 2015

Jonathan Edwards said:
I think toolkits for standard measures are much of the problem - SF36 screwdrivers, Minnesota mole wrenches etc. Every trial is different and all researchers must be free to use new methods for new problems - but they must also be prepared to defend them against reasonable criticism!

There is a difference between requiring everyone to measure exactly the same things and having a standardised minimum set of data, such as that proposed in the following paper. Researchers are still free to use their own measures beyond this if they believe they would be more useful.

http://www.ncbi.nlm.nih.gov/pubmed/22306456

Brain Behav Immun. 2012 Mar;26(3):401-6. doi: 10.1016/j.bbi.2012.01.014. Epub 2012 Jan 28.
Minimum data elements for research reports on CFS.
Jason LA1, Unger ER, Dimitrakoff JD, Fagin AP, Houghton M, Cook DB, Marshall GD Jr, Klimas N, Snell C.
Author information
Abstract
Chronic fatigue syndrome (CFS) is a debilitating condition that has received increasing attention from researchers in the past decade. However, it has become difficult to compare data collected in different laboratories due to the variability in basic information regarding descriptions of sampling methods, patient characteristics, and clinical assessments. The issue of variability in CFS research was recently highlighted at the NIH's 2011 State of the Knowledge of CFS meeting prompting researchers to consider the critical information that should be included in CFS research reports. To address this problem, we present our consensus on the minimum data elements that should be included in all CFS research reports, along with additional elements that are currently being evaluated in specific research studies that show promise as important patient descriptors for subgrouping of CFS. These recommendations are intended to improve the consistency of reported methods and the interpretability of reported results. Adherence to minimum standards and increased reporting consistency will allow for better comparisons among published CFS articles, provide guidance for future research and foster the generation of knowledge that can directly benefit the patient.

alex3619 · Mar 27, 2015

MeSci said:
There was discussion about doing various tasks 'without difficulty'.

Here is my complete list of tasks I can do without difficulty:

.

MeSci · Mar 27, 2015

Snow Leopard said:
Since a Bayesian approach was mentioned, I am wondering about what this means in terms of prior probabilities of improvement and personalisation of measures. Eg in terms of improvement, someone with well below activity level, would be expected to return to the group/population mean, if the treatment was highly successful for example. But if on another measure, say some measure of OI, or heart rate variability, or sleep issues etc were more or less normal before the treatment, then requiring some degree of improvement on this measure to achieve some sort of improvement grade would not make sense.

I am starting to see the usefulness of the composite measure, that doesn't intend to measure improvement on each and every measure and claim an improvement = the treatment works. The key being achieving a certain grade across all measures relative to a norm perhaps? - so zero improvement might be necessary on a particular measure, if the result was already acceptable on that measure. But a very large improvement on another measure might be necessary to achieve a particular grade, since the result on that measure was well below average.

The only problem with this approach is avoiding the statistical noise already discussed, due to day to day variation. Sampling over time is possible, but obviously places additional burdens on the participants.

Which actually brings up a key point. "Questionnaire fatigue". I don't mean in an illness sense, but in a psychological sense where people get overwhelmed with excessively long lists of questions and answer less accurately as a result.

Yes - questionnaire fatigue is very much an issue for me. Also, one of those systems whereby you get unexpected calls, texts, etc. and are asked to report symptoms at that moment, would be useless for me. If I get a phone call that starts by asking how I am, I am stumped for an answer. I have just had to abandon what I was doing, hope that I left it in a safe condition and that the cats won't walk on my laptop keyboard and destroy my work, I have had to get up quickly, often dropping things on the floor in my haste, grab the phone, remember how to answer it (!), then try to figure out who is calling. This happened the other day - it was a friend. How was I? I really had no idea - for a few seconds I don't know if I was even sure who I was! I can't remember my first few words, but I don't think they were very coherent as I was dazed. I often can't easily think and speak at the same time.

EDIT

Meant to say - it is useful to know about improvements in some parameters on their own, as overall treatment may require several treatments at the same time - different treatments for different problems - so it is useful to know if one treatment improves the thing you were hoping it would improve.

alex3619 · Mar 27, 2015

Smart phones have huge limitations, even if you give one to patients for a study. Many of us cannot read them. They are too small. I am sitting here on my laptop, with a normal laptop screen, and its plugged into my big monitor. I really struggle with a laptop screen.

Also, what do you do about patients who are in a state where they cannot read? Here is what a survey question would look like, only so blurred we could hardly see it:

Blah blah blah, blah blah blah blah blah blah. Blah? Blah. Blah. Blah. Blah.

I have had times when I could not read, nor write, nor use a computer. This is primarily about the limitations of severe patients.

MeSci · Mar 27, 2015

alex3619 said:
I would like to add a comment about timing of tests, to add to what has already been said.

I have a difficult time with doctors who think that tests need to be fasting and early morning, for example. They cannot comprehend I have no morning, or that sometimes I do not sleep at all. Timing of tests is critical for some tests. Yet what does this mean if you are supposed to have an overnight fast, but that overnight is the only time you are awake? And you are diabetic? I argue with new doctors on these issues far too often.

Standardized early morning or post prandial or whatever kind of test you want is highly problematic in a disease with multiple shifts in circadian patterns, both in terms of chemistry (e.g. cortisol) and behaviour (e.g. sleep). My early morning tests might be the middle of my sleep time, or the middle of the day, or in the middle of several days without any sleep at all, or even far too much sleep.

Absolutely. A prime example of medicine being designed without taking into account the needs and idiosyncrasies of real human patients.

mango · Mar 27, 2015

alex3619 said:
Smart phones have huge limitations[...]

and the radiation can be a huge problem too. that's why i can't use them.

eafw · Mar 27, 2015

Valentijn said:
It's one of my favorite ways to monitor my capabilities, but definitely needs additional context to understand what's happening over time.

Old fashioned note keeping can work. Just take measurements through the course of the day and record the activity. This would need to be done if if someone had a continuous wrist monitor, but again the key thing is lots of measurements over weeks and months. We would be looking for trends, so not a short term thing.

MeSci said:
There was discussion about doing various tasks 'without difficulty'.

Snow Leopard said:
So not simply questions about 'do you suffer PEM', but do you carefully manage your activity level to avoid symptoms'.

Yes, and this comes back to the fact that whoever is doing the research or setting up the measures needs to really understand the illness and the way that people live with it.

There is also the issue of subgroups, both in terms of severity and symptom cluster-types. The researchers need to understand this as well and it will help with the problem of endless meaningless and useless questionnaires.

alex3619 · Mar 27, 2015

mango said:
and the radiation can be a huge problem too. that's why i can't use them.

Just a couple of years ago we had no understanding of electromagnetic sensitivity. What we were missing is there is a huge body of literature on this in zoology. The mechanism is known. Most of the popular theories are clearly wrong. Marty Pall pointed this out.

eafw · Mar 27, 2015

alex3619 said:
I have had times when I could not read, nor write, nor use a computer. This is primarily about the limitations of severe patients.

Which is why we will not get a "one size fits all" set of outcome measures. It does need to be tailored to each group or individual depending on what the researcher is trying to achieve

Bob · Mar 27, 2015

The relative nature of our fluctuations in health has been discussed above...
I agree that it's very difficult to answer questions about our health in abstract or relative terms.
I've been thinking more about this...

I think Alex mentioned that a researcher or physician had invented a checklist to assess symptoms (but I didn't see any further details posted about this)...

I wonder if a sort of checklist questionnaire might be helpful, in terms of measuring physical function, but I think perhaps such a questionnaire wouldn't be able to pick up on subtleties very well... I'm thinking of a questionnaire which lists specific activities in words or images, such as 'cleaning teeth', 'showering', 'cooking a meal', 'leaving the home', 'walking outside the home', 'walking 50m/100m/200m/500m/1000m' etc. And the patient would indicate how many of these activities they'd engaged in over the past day or week. Or, better still, they'd fill in the questionnaire online on a daily basis. This would remove much of the subjectivity and relatively from a self-report questionnaire. I can see weaknesses in such an approach though, especially relating to differences in life style between individuals.

I think we could design such a questionnaire between us, but perhaps it would be too complicated to test it to see if it was useful.

Anyway, it's just a thought.

MeSci · Mar 27, 2015

Snow Leopard said:
It would be expected that given this question at the start of the trial, due to the human bias towards optimism, more patients would expect to be in the active treatment arm.

Do you think so? That would appear to be very blind optimism, as there is no reason I can think of why a person would be more likely to be in one group rather than another. I expect my own reply would be "Absolutely no idea" or equivalent! But I am unusually logical, I think (borderline Asperger's).

Valentijn · Mar 27, 2015

alex3619 said:
I have had times when I could not read, nor write, nor use a computer. This is primarily about the limitations of severe patients.

For severe patients at least, they should have caregivers who could answer for them, or help with it. Though it could still present a problem for patients without caregivers who are generally more functional, but have hit a really bad patch.

Valentijn · Mar 27, 2015

I don't really like the idea of manually keeping track of activities all day. It would a pain in the butt, we'd forget to do it half the time (at least!), and it would be exhausting.

At most I'd think a short and simple questionnaire every day would be appropriate. But for daily activity details, actometers are much easier as well as more objective.

Measures of outcome for trials and other studies

Senior Member

Senior Member

Senior Member

ME/CFS since 1995; activity level 6?

Hibernating

Hibernating

Senior Member

Hibernating

Senior Member

ME/CFS since 1995; activity level 6?

Senior Member

ME/CFS since 1995; activity level 6?

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

ME/CFS since 1995; activity level 6?

Senior Member

Senior Member