• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

anciendaze

Senior Member
Messages
1,841
Just when I checked back to see possible responses to my posts I noticed that histogram. The distribution shows marked kurtosis. This is not a normal distribution. There is also an extended left tail. The boundary for the PACE trial seems to be at a point where the discrepancy between the histogram and a normal distribution is unusually large.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
Just when I checked back to see possible responses to my posts I noticed that histogram. The distribution shows marked kurtosis. This is not a normal distribution. There is also an extended left tail. The boundary for the PACE trial seems to be at a point where the discrepancy between the histogram and a normal distribution is unusually large.

From the paper oceanblue is curently torturing himself with :

US population norms for SF 36 PF subscale as :

Mean 84.15, Standard deviation 23.28, median 90, range 0-100, ceiling (top box) 38.70

And even when SF 36 scores are normalised to a mean of 50, they still don't approximate a normal distribution!
 

Dolphin

Senior Member
Messages
17,567
I'm not sure if it has been said before but the point about using mean - 1 S.D. is that it is supposed to represent a certain percentage of the population: 15/16% if it is normally distributed. However, with a distribution that is not normally distributed, one doesn't have enough information from the mean and standard deviation to tell the exact percentage. However, the SD will increase by the extreme low values so one knows it isn't 15-16% (and one can be 99%/100% (?) sure the figure with a score less than mean-1SD is less than 15/16% of the population.
 

Dolphin

Senior Member
Messages
17,567
From the paper oceanblue is curently torturing himself with :

US population norms for SF 36 PF subscale as :

Mean 84.15, Standard deviation 23.28, median 90, range 0-100, ceiling (top box) 38.70

And even when SF 36 scores are normalised to a mean of 50, they still don't approximate a normal distribution!
I haven't looked closely but this looks like data for the whole population and so would include people in their 70s/80s/90s which brings down the average (and would increase the SD presumably), while the PACE authors talk about people of working age.
 

oceanblue

Guest
Messages
1,383
Location
UK
However, the SD will increase by the extreme low values so one knows it isn't 15-16% (and one can be 99%/100% (?) sure the figure with a score less than mean-1SD is less than 15/16% of the population.

That's very helpful, thanks, and this, from the Knoop paper discussion (which also used the mean - 1SD formula) seems to agree:
In determining the threshold scores for recovery we
assumed a normal distribution of scores. However, in the
healthy population the SIP [used to measure fatigue] and SF-36 scores were not
normally distributed
. Therefore one could argue that recovery
according to the SIP8 has to be defined as scoring
the same or lower than the 85th percentile of the healthy
reference group. In that case, the recovery rate using the
definition of having no disabilities in all domains (i.e.
scoring the same or lower than the 85th percentile on the
SIP8) would decrease from 26 to 20%
. As we do not know
the exact distribution of the SF-36 scores, we cannot control
for the effects of violation of the assumption of normality.
 

oceanblue

Guest
Messages
1,383
Location
UK
From the paper oceanblue is curently torturing himself with :
US population norms for SF 36 PF subscale as :
Mean 84.15, Standard deviation 23.28, median 90, range 0-100, ceiling (top box) 38.70
And even when SF 36 scores are normalised to a mean of 50, they still don't approximate a normal distribution!
I survived! :D thanks
Looking at the original distribution I posted, nearly half the sample scored the max of 100, even though it included nearly 30% of the sample were aged 65 or over so similar to the figures you just quoted.
 

Dolphin

Senior Member
Messages
17,567
I survived! :D thanks
Looking at the original distribution I posted, nearly half the sample scored the max of 100, even though it included nearly 30% of the sample were aged 65 or over so similar to the figures you just quoted.
Another point for a letter occurs to me: they are basically trying to say this normal functioning/"recovered" (former) CFS group is like a group from the population. One could ask ask them how many have scores of 100/100. I bet there aren't many. It'd be good to get that on the record.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
I haven't looked closely but this looks like data for the whole population and so would include people in their 70s/80s/90s which brings down the average (and would increase the SD presumably), while the PACE authors talk about people of working age.

All very true Dolphin and the median will also be dragged down by the long tail.

For such a skewed distribution the measure of central tendancy that can be construed as equating to normal is the mode.
 

Dolphin

Senior Member
Messages
17,567
That's very helpful, thanks, and this, from the Knoop paper discussion (which also used the mean - 1SD formula) seems to agree:
In determining the threshold scores for recovery we
assumed a normal distribution of scores. However, in the
healthy population the SIP [used to measure fatigue] and SF-36 scores were not
normally distributed. Therefore one could argue that recovery
according to the SIP8 has to be defined as scoring
the same or lower than the 85th percentile of the healthy
reference group. In that case, the recovery rate using the
definition of having no disabilities in all domains (i.e.
scoring the same or lower than the 85th percentile on the
SIP8) would decrease from 26 to 20%. As we do not know
the exact distribution of the SF-36 scores, we cannot control
for the effects of violation of the assumption of normality.
It is annoying in this (Knoop et al., 2007) paper that they don't then recalculate the over "full recovery" figure. This would bring it down from 23% to a maximum of 20% but given it dropped 6%, it could go as low as 17%. That's what we get from having people who want to hype the efficacy/effectiveness of CBT.
 

anciendaze

Senior Member
Messages
1,841
If that long left tail is due to random variation, moving to slightly higher scores is much less significant than claimed. The natural hypothesis that the tail is occupied by sick people, and is scarcely random at all, contradicts their central assumption. They appear to be having things both ways.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
Are there any genuine statisticians in the house (no offence intended)?


I'm trying to understand the significance of the P (interaction) values detailed in Figure 2 of the PACE paper which compares primary outcomes by treatment and by diagnostic criteria/co-morbid depressive disorder.

The figure footnotes state :

P (interaction) is the p value of the interaction between treatment and criteria or disorder from the adjusted model.

The text states :

"We calculated intraclass correlation coefficients, adjusted for baseline outcomes, using oneway random effects analysis of covariance at 52 weeks within every treatment group. Unadjusted and Bonferroni corrected p values are provided for five comparisons for both primary outcomes. Comparisons of primary outcomes across treatment groups by alternative criteria for chronic fatigue syndrome and myalgic encephalomyelitis, and comorbid depressive disorder included the treatment by criteria or disorder interaction terms. Because some errors were made in stratification at randomisation, we used true status variables rather than status at stratification as covariates."

The p interaction values for the depressive disorder sub group are around 0.93. Is this good, bad, irrelevant?
 

Dolphin

Senior Member
Messages
17,567
Are there any genuine statisticians in the house (no offence intended)?


I'm trying to understand the significance of the P (interaction) values detailed in Figure 2 of the PACE paper which compares primary outcomes by treatment and by diagnostic criteria/co-morbid depressive disorder.

The figure footnotes state :

P (interaction) is the p value of the interaction between treatment and criteria or disorder from the adjusted model.

The text states :

"We calculated intraclass correlation coefficients, adjusted for baseline outcomes, using oneway random effects analysis of covariance at 52 weeks within every treatment group. Unadjusted and Bonferroni corrected p values are provided for five comparisons for both primary outcomes. Comparisons of primary outcomes across treatment groups by alternative criteria for chronic fatigue syndrome and myalgic encephalomyelitis, and comorbid depressive disorder included the treatment by criteria or disorder interaction terms. Because some errors were made in stratification at randomisation, we used true status variables rather than status at stratification as covariates."

The p interaction values for the depressive disorder sub group are around 0.93. Is this good, bad, irrelevant?
I have to admit I don't understand what exactly was done here. However the interpretation that using the London ME criteria or international criteria made no difference works so I have read this that it's the same for depression (i.e. the results aren't different for the group with depressive disorder vs everyone). In this reading of it, a p<0.05 would mean there was a significant difference for a subgroup (compared to the top graph). This interpretation seems to work even though I admit I'm not sure what exactly is being done statistically.
 

Marco

Grrrrrrr!
Messages
2,386
Location
Near Cognac, France
I have to admit I don't understand what exactly was done here. However the interpretation that using the London ME criteria or international criteria made no difference works so I have read this that it's the same for depression (i.e. the results aren't different for the group with depressive disorder vs everyone). In this reading of it, a p<0.05 would mean there was a significant difference for a subgroup (compared to the top graph). This interpretation seems to work even though I admit I'm not sure what exactly is being done statistically.

I have to admit also that I'm on a little bit of a fishing trip here and basically trying to determine the effect on the overall results that those with co-mordid depression may have made to the overall results. Eyeballing the charts and data doesn't do it although one thing that can be said is that there 'appears' to be more variance in the co-morbid depression group.

Again without understanding the statistics, they seem to have conducted an anova/manova to determine if there were any interactions between treatment group outcomes and either baseline scores or diagnostic criteria/co-morbid depression.

If as you say Dolphin, a p of less that 0.05 compared with the top all participants group suggest that the subgroup results are significantly different, is it safe to assume then that the larger the P value, the more respresentative the sub group results are of the overall results?

If so, this could be interpreted as suggesting that the co-morbid depression sub group contributed significantly to the overall scores, perhaps disproportionately so as they made up just 33% of the total cohort.

Which would beg the question, what exactly were CBT and GET treating?
 

anciendaze

Senior Member
Messages
1,841
Yet another mathematical problem, avoiding negative scores

In a break from CROI material, I thought of yet another problem with the activity scale and normal distributions.

Normal distributions have no bound on random error. Numbers can go all the way from negative infinity to positive infinity, though the probability approaches arbitrarily close to zero. While I could imagine mania as negative fatigue, the only way I could imagine a person might have negative physical activity is by decomposing, (a position which I don't feel like defending.)

If we take that distribution of activity scores as defined by mean 100 and SD around 30, zero is a little over 3 SD from the mean. This is not anywhere near impossibility. It is a particular problem when the entire trial takes place over 1 SD below the mean, and some scores are 2 SD below the mean.

To avoid this kind of thing, one approach is to transform raw scores into scores that do run from negative infinity to positive infinity. Typically, you transform the mean to zero.

With such a transformation, that left tail looks even worse.
 

Dolphin

Senior Member
Messages
17,567
I have to admit also that I'm on a little bit of a fishing trip here and basically trying to determine the effect on the overall results that those with co-mordid depression may have made to the overall results. Eyeballing the charts and data doesn't do it although one thing that can be said is that there 'appears' to be more variance in the co-morbid depression group.

Again without understanding the statistics, they seem to have conducted an anova/manova to determine if there were any interactions between treatment group outcomes and either baseline scores or diagnostic criteria/co-morbid depression.

If as you say Dolphin, a p of less that 0.05 compared with the top all participants group suggest that the subgroup results are significantly different, is it safe to assume then that the larger the P value, the more respresentative the sub group results are of the overall results?

If so, this could be interpreted as suggesting that the co-morbid depression sub group contributed significantly to the overall scores, perhaps disproportionately so as they made up just 33% of the total cohort.

Which would beg the question, what exactly were CBT and GET treating?
Well, your logic makes sense. But I'm not sure that would be a strong enough point to print. Unfortunately, although it is not an issue if one doesn't start comparing p-values that are >.05, another interpretation of the p-values might be possible: use depressive status (or criteria status) as a co-variant i.e. "control for this factor". Then perhaps the p-value might then mean that controlling for depression made the least difference (using the logic above - as the p value was closed to 1). (I don't think this is what they did as this would normally be explained in a more simple or straightforward fashion).

Anyway, I'm sorry if you or anyone spent mental energy on this based on information I may have given which may or may not be correct - I have tried to say that I don't know exactly what is being done except that whatever they are doing, because none of the p-values are less than <0.05, they are saying that these factors are not important. They have done something unusual and I know an experienced researcher who was unclear what they are doing.

Of course, a bigger problem in all this is the outcome measures themselves. I would be much more interested in actometer data, for example, and how it might vary based on the different criteria. Or perhaps hours worked which they collected.

Also the definitions for ME and internatonal CFS (i.e. the problems with them) could be partly or wholly the reason for no differences being found.
 

Angela Kennedy

Senior Member
Messages
1,026
Location
Essex, UK
I have to admit also that I'm on a little bit of a fishing trip here and basically trying to determine the effect on the overall results that those with co-mordid depression may have made to the overall results. Eyeballing the charts and data doesn't do it although one thing that can be said is that there 'appears' to be more variance in the co-morbid depression group.

Again without understanding the statistics, they seem to have conducted an anova/manova to determine if there were any interactions between treatment group outcomes and either baseline scores or diagnostic criteria/co-morbid depression.

If as you say Dolphin, a p of less that 0.05 compared with the top all participants group suggest that the subgroup results are significantly different, is it safe to assume then that the larger the P value, the more respresentative the sub group results are of the overall results?

If so, this could be interpreted as suggesting that the co-morbid depression sub group contributed significantly to the overall scores, perhaps disproportionately so as they made up just 33% of the total cohort.

Which would beg the question, what exactly were CBT and GET treating?

NOT neurological dysfunction presentations OF Myalgic Encephalomyelitis, by the looks of it.