• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

PACE Trial and PACE Trial Protocol

oceanblue

Guest
Messages
1,383
Location
UK
People might be interested in this paper:
Song, S., & Jason, L.A. (2005). A population-based study of chronic fatigue syndrome (CFS) experienced in differing patient groups: An effort to replicate Vercoulen et al.’s model of CFS. Journal of Mental Health, 14, 277-289.Retrieved from http://www.cfids-cab.org/cfs-inform/...ng.jason05.pdf - free full text there
Interesting paper and I have to admit that Structural Equation modelling is well beyond me. However, I noted this in the conclusion:
There were several limitations in the present study. Several of the measures used were
designed specifically for this study, and there were no reliability estimates for them. Another
important limitation of the present study was the small sample sizes. However, the fact that
the Vercoulen et al. model did fit with the chronic fatigue group due to psychiatric reasons
does indicate that there was enough power in even this small sample. Clearly, there is a need
for additional studies in this area, with larger sample sizes
So it seems to be a case of challenging one flawed study with another flawed study, with nothing really resolved - a depressingly familiar scenario in CFS research.
 

anciendaze

Senior Member
Messages
1,841
normal distribution of error?

Here's a link to an interesting discussion by statisticians on why assume a normal distribution. There is some refreshing candor about convenience versus rigor.

I also want to point out you can have Gaussian distributions in several dimensions. (Think of bullet holes in targets.) Three dimensions led to the Maxwell-Boltzmann distribution for magnitudes of velocities. In two dimensions, you might derive a Rayleigh distribution.

Even though the underlying variations are normal, this is no guarantee for derived quantities. Both these distributions I mention are absolute values (root-mean-square) of differences from the mean. Therefore, negative values are impossible. (Is it possible for the radial distance from the center of a cluster to a bullet hole to be negative?) Both distributions are therefore one-sided. Both show considerable skew, kurtosis and extended tails, if interpreted as normal.

As a more visual example, consider taking data from electron micrographs. The structures imaged are three-dimensional, but measurements are made on two-dimensional images. You may even model radial error in two-dimensions with a single variable, reducing a three-dimensional error to one dimension. This kind of projection leads to predictable statistical distortions.

Suppose you did not know the data were derived that way. Is there any inherent characteristic to reveal it? If there were no negative values, that would be a clue. If the mean, median and mode were different, that would be another clue. If you don't know the original number of dimensions, the kurtosis and tail might allow you to guess.

The point of these observations for PACE is that we see such characteristics in the population data assumed to model well as normal. The values of 0 and 100 are completely arbitrary, so are units. If we assume it measures impairment, not health, we could explain the absence of numbers above 100 as a restriction to non-negative values of impairment from an assumed standard value of 100. There would be a natural bound where impairment leads to death, which is not reached because all subjects are living. Construct a histogram based on an assumed distribution like Rayleigh or Maxwell-Boltzmann, and you will see how well it fits population data.

Even though we can't see the mathematical space in which health and illness are measured, we can see the shadows cast through data. This appears to be measuring random processes in higher dimensions projected onto a one-dimensional scale.
 

oceanblue

Guest
Messages
1,383
Location
UK
PACE GET only has a small effect on physical conditioning - official

GET is based on the theory that CFS is perpetuated by physical deconditioning and the only outcome of physical condition is the 6MWT. I've already posted that the improvement in 6MWT for GET (relative to SMC control) is below the 'clinically useful difference' (CUD) threshold of 0.5 baseline SD.

However, the CUD measure was only specificed by the authors for the primary outcomes of fatigue and physical function. So instead I've looked at the 6MWT test with a generic measure of effect, called Cohen's d, that is widely used to compare medical studies, in meta-analyses in particular. In other words it's perfectly appropriate to apply this measure to 6MWT.

The Cohen's d for GET 6MWT is 0.34, and crucially that ranks as a small effect (which is consistent with the the increase not making a 'clinically useful difference').

This means that GET, a therapy based on treating a perceived physical deconditioning makes only a small difference to physical condition after one year. Which in turn suggests that a) the therapy isn't much good and b) the deconditioning theory it's based on is probably wrong too.

I can almost feel a letter coming on.

ps I know the basic point here has been made before, but using a well-regarded measure like Cohen's d to quantify it adds weight to the point.
 

anciendaze

Senior Member
Messages
1,841
Perturbations

My previous interpretation of the population data for physical activity as a one-dimensional value derived from random processes in several dimensions (which are not directly observed) has a surprising implication. It leads to a very simple model in which a subpopulation with larger deviations shows up with reduced mean and very approximately normal distribution.

All I need to assume is that the subpopulation is more sensitive to random perturbations than the general population. You might even catch a member of this subpopulation performing at healthy levels, but you would have to be quick. They would spend most of their time bouncing around on the outer fringes. Their mean performance would appear unusually low on the scale I described. (Sound like anyone you know?) Crucially, the SD of this subpopulation would not be a good measure for estimating the SD of the general population.

Graphs of Maxwell-Boltzmann distributions for different values of SD illustrate the way this shift in apparent mean connects to changes in SD for random processes in three dimensions. Elaborate psychological interpretations are not required.
 

Dolphin

Senior Member
Messages
17,567
GET is based on the theory that CFS is perpetuated by physical deconditioning and the only outcome of physical condition is the 6MWT. I've already posted that the improvement in 6MWT for GET (relative to SMC control) is below the 'clinically useful difference' (CUD) threshold of 0.5 baseline SD.

However, the CUD measure was only specificed by the authors for the primary outcomes of fatigue and physical function. So instead I've looked at the 6MWT test with a generic measure of effect, called Cohen's d, that is widely used to compare medical studies, in meta-analyses in particular. In other words it's perfectly appropriate to apply this measure to 6MWT.

The Cohen's d for GET 6MWT is 0.34, and crucially that ranks as a small effect (which is consistent with the the increase not making a 'clinically useful difference').

This means that GET, a therapy based on treating a perceived physical deconditioning makes only a small difference to physical condition after one year. Which in turn suggests that a) the therapy isn't much good and b) the deconditioning theory it's based on is probably wrong too.

I can almost feel a letter coming on.

ps I know the basic point here has been made before, but using a well-regarded measure like Cohen's d to quantify it adds weight to the point.
Well done. And don't forget that for CBT is would be a tiny bit negative (but might as well be remembered as no difference as not statistically different).

The other point about the 6MWT is that the difference with APT would be similar to what you've given for both GET and what I gave for CBT.
 

Dolphin

Senior Member
Messages
17,567
Nice quote.

Re Top Box, I think 100/100 might be a bit too high. I've tried to estimate the data from the graph in the Bowling study - only 57% scored in the top box, and it looks like the top box is scores of 95 or 100.
View attachment 5239
I can't remember who drew my attention to the following:
Velanovich V.
Behavior and analysis of 36-item Short-Form Health Survey data for surgical quality-of-life research.
Arch Surg. 2007 May;142(5):473-7; discussion 478. http://archsurg.ama-assn.org/cgi/content/full/142/5/473

Anyway Table 2 gives the figures for the US general population: 38% have a score of 100 on the physical functioning subscale (I wonder is the data out there for people of a working age). He was suggesting 100 could be used.

Also, not sure if the figure was given earlier but for the US general population (including very old people), this paper says the mean (SD) is 84.15 (23.28) which again makes one doubt the figures that the PACE Trial paper gave for the working age population:
equal to or above the mean minus 1 SD
scores of the UK working age population of 84 (–24) for
physical function (score of 60 or more).
 

anciendaze

Senior Member
Messages
1,841
There was never any doubt in my mind the physical activity scale was not intended to rank athletes. It is strictly to measure health, and shortcomings from health. This means scores are cut off at 100. The distribution is one-sided.

For an approximate Gaussian distribution, mean, median and mode are likely to be the same. For the population figures, these are significantly different.

In thinking through 'distributions I have known', the Maxwell-Boltzmann came to mind. It has no negative values. If I treat 100 on the current scale as zero impairment and 0 as 100% impairment, the graph for the Maxwell-Boltzmann distribution gets flipped right for left. This puts the mode close to 100. The long right tail on the M-B distribution becomes the long left tail on physical activity. For a small value for SD, in the 3-D space, the values look like a good fit. For a large value for SD, describing a subpopulation in 3-D space, we get something very similar to the distribution seen in patient groups.

The astounding thing, for me, is that this assumes the mean in that 3-D space (presumably describing physiology) does not have to differ at all for patients and general population to produce the apparent shift in means. In this model, the apparent mean on the scale is the result of higher variance in the patient groups. Any effect selectively removing those patients with large deviations, in any direction in that 3-D space, will result in a group with lower SD in that space, and an apparent shift in the mean on the one-dimensional scale. In this interpretation, the trial didn't shift the mean at all for any group.

This is an extreme counter argument. I did not go into this expecting to find such an idea working so well. It may have applications to studies unrelated to ME/CFS, such as those surgery studies Velanovich analyzed. Variability may be the most important feature of ME/CFS patients. This matches a great deal of personal experience.
 

Dolphin

Senior Member
Messages
17,567
If anyone wants to send in a reply to this, it'd be appreciated.
It was included in a free newspaper for Irish doctors.


Last year, following publishing a piece on the Santhouse et al. editorial in the British Medical Journal, they published not one but five letters over a series of weeks (John Greensmith, Tom Kindlon, Gerwyn Morris, Orla Ni Chomhrai & Vance Spence (only two with Irish addresses) - that was most of the people who wrote in, as I recall.
They may be glad to fill up space in their newspaper.


People can also put comments online but letters would be preferred. You can always post your letter as a comment if you prefer.


If you sent in a letter to the Lancet, you could get a chance to re-use it (ordinary newspapers might find it too technical). Probably best to not put the references underneath - just put the name of the first author + et al. + year in brackets e.g. (White et al., 2011) to refer to Lancet paper. If you want me to look at it, feel free.

References aren't essential of course.

Even if your point doesn't relate to what is in the Irish Medical Times article, one can still criticise the study.


Probably best to keep letters under 400 words and ideally less than that again.
Address is: editor@imt.ie that's editor @ imt.ie


Don't forget to put your address in the letter and also a telephone number (which won't be published).


Thanks



http://bit.ly/hAvLon
i.e.
http://www.imt.ie/clinical/2011/03/cognitive-behavioural-therapy-not-harmful
-in-chronic-fatigue.html

You are here: Home / Clinical times / Cognitive behavioural therapy not harmful in chronic fatigue

Cognitive behavioural therapy not harmful in chronic fatigue

March 18, 2011 By admin 1 Comment

Patient groups’ concerns that cognitive behavioural therapy (CBT) and graded exercise therapy could be harmful for the treatment of chronic fatigue syndrome can be allayed due to a large study showing that both are effective and safe.


But the randomised PACE trial of nearly 650 patients did find that adaptive pacing therapy (APT) – a therapy sometimes favoured by patient groups – was not more helpful in reducing fatigue or physical function than specialist medical care alone (SMC), contrary to the researchers’ initial hypothesis.

The British researchers randomised 160 people to each of the four treatment
groups: CBT, GET or APT combined with specialist medical care, and a final group with specialist medical care only.

GET was based on “deconditioning and exercise intolerance theories of chronic fatigue” and consisted of negotiated, gradual increases in exercise intensity over the period of intervention. APT was based on the “envelope theory of chronic fatigue” and consisted of identifying links between activity and fatigue followed by a plan to avoid exacerbations.

Before treatment began, patient expectations were high for both APT and GET but lower for CBT and SMC, the researchers reported.

Those treated with CBT or GET in combination with SMC did better with respect to both primary outcomes — fatigue, measured on the Chalder fatigue questionnaire and physical function, measured on the short form-36 physical function subscale.

The researchers concluded that both treatments were effective for chronic fatigue with “moderate” effect sizes. They suggested that the lack of benefit for APT combine with SMC could have been a result of the greater than expected improvement with SMC alone.

There were no more adverse reactions to the behavioural interventions than specialist care alone, a finding that was important according to two researchers from the Expert Centre for Chronic Fatigue in the Netherlands.

“This finding is important and should be communicated to patients to dispel unnecessary concerns about the possible detrimental effects of cognitive behaviour therapy and graded exercise therapy, which will hopefully be a useful reminder of the potential positive effects of both interventions,”
they wrote in an accompanying editorial.

Lancet 2011; Online. doi:10.1016/S0140-6736(11)60096-2
I just sent in a letter now. I had got some feedback/input from some of the contributors to this thread.

I'm not sure I should post it somewhere where it might show up for search engines but I'm happy to show it to anyone who is thinking of writing in in case you want to avoid duplication. However, if they do it like last year, they might include one letter a week (they published five last year) and so they probably won't mind if some of the points are the same. The chances of all the points being the same are small. Thanks.
 

Dolphin

Senior Member
Messages
17,567
There was never any doubt in my mind the physical activity scale was not intended to rank athletes. It is strictly to measure health, and shortcomings from health. This means scores are cut off at 100. The distribution is one-sided.

For an approximate Gaussian distribution, mean, median and mode are likely to be the same. For the population figures, these are significantly different.

In thinking through 'distributions I have known', the Maxwell-Boltzmann came to mind. It has no negative values. If I treat 100 on the current scale as zero impairment and 0 as 100% impairment, the graph for the Maxwell-Boltzmann distribution gets flipped right for left. This puts the mode close to 100. The long right tail on the M-B distribution becomes the long left tail on physical activity. For a small value for SD, in the 3-D space, the values look like a good fit. For a large value for SD, describing a subpopulation in 3-D space, we get something very similar to the distribution seen in patient groups.

The astounding thing, for me, is that this assumes the mean in that 3-D space (presumably describing physiology) does not have to differ at all for patients and general population to produce the apparent shift in means. In this model, the apparent mean on the scale is the result of higher variance in the patient groups. Any effect selectively removing those patients with large deviations, in any direction in that 3-D space, will result in a group with lower SD in that space, and an apparent shift in the mean on the one-dimensional scale. In this interpretation, the trial didn't shift the mean at all for any group.

This is an extreme counter argument. I did not go into this expecting to find such an idea working so well. It may have applications to studies unrelated to ME/CFS, such as those surgery studies Velanovich analyzed. Variability may be the most important feature of ME/CFS patients. This matches a great deal of personal experience.
Not sure if this is what you are saying, but what you have written has got me thinking that one might argue that part of the improvement/positive change is "regression to the mean" (the mean being the overall mean of CFS taken from a distribution which would involve higher values than 65).
 

anciendaze

Senior Member
Messages
1,841
Not sure if this is what you are saying, but what you have written has got me thinking that one might argue that part of the improvement/positive change is "regression to the mean" (the mean being the overall mean of CFS taken from a distribution which would involve higher values than 65).
"Regression toward the mean" is not exactly what I mean, but gets into other arguments about significance in a short series of measurements. My claim is that the apparent mean observed in samples need not reflect any change at all in the mean of the distribution from which the measure is derived. A point sitting right at the mean value could still result in either population, all that has changed is the probability. Naively calculating means in the observed measure when a distribution demonstrates extreme departure from Gaussian behavior produces nonsense. When a distribution lacks symmetry, (is heavily skewed) shifts in apparent mean will be a necessary consequence of changes in variance.

My original motivation for introducing the M-B distribution was to provide an example in which Gaussian random behavior in a different space is reflected in non-Gaussian variation in a derived measure. Even if Gaussian behavior is at the root of the problem, you have no guarantee it will appear directly in some arbitrary measure you choose.

This is not a finished theory. I am still looking for someone to work with me on it. The distribution which I've proposed still allows arbitrarily large deviations from health which could produce negative physical activity. I'm looking for bounded distributions with similar behavior or a transformation on this one which makes all possible numerical scores meaningful. (zombie eradication)

I'll admit to having a peculiar preference for doing analysis on numbers which are meaningful to begin with.
 

oceanblue

Guest
Messages
1,383
Location
UK
Anyway Table 2 gives the figures for the US general population: 38% have a score of 100 on the physical functioning subscale (I wonder is the data out there for people of a working age). He was suggesting 100 could be used.

Also, not sure if the figure was given earlier but for the US general population (including very old people), this paper says the mean (SD) is 84.15 (23.28) which again makes one doubt the figures that the PACE Trial paper gave for the working age population:
Interesting paper and it would be so good to get accurate data on the working age population..
 

oceanblue

Guest
Messages
1,383
Location
UK
High level of consent refusal

I've just being going through the numbers in fig 1 of PACE Trial inclusions/exclusions.
  • 37% of patients deemed suitable for research assessment declined assessment or randomisation
  • a further 10% of those who were assessed as suitable for the Trial declined to take part in it
That's nearly half refusing consent; the protocol assumed a third would refuse.

Screened for eligibility 3,158
Excluded by clinic doctor 1,698
Candidates for Trial assessment 1,460
No consent 533 (37% of candidates)
[no recorded reason for exclusion 29]

Assessed for research 898
excluded by research assessor 176
Candidates for Trial 722
No consent 69 (10% of candidates)
[no recorded reason for exclusion 12]
proceeded to trial 641

These figures seem pretty high to me. thoughts, anyone?
 

oceanblue

Guest
Messages
1,383
Location
UK
How many real cases of CDC/International Criteria CFS?

I think Dolphin has already pointed out that PACE have appeared to diagnose 'International Criteria' CFS (2003 update of Fukuda) by simply asking participants if they have experienced the symptoms in the past week.

Here's how they should have done it, according to the 2003 paper:
Reeves 2003 (ref for PACE)said:

Definition and Evaluation of Accompanying Symptoms
The 1994 case definition defines CFS by the presence of
debilitating fatigue accompanied by at least 4 of 8 designated
symptoms. Accompanying symptoms must have
persisted or recurred during 6 or more consecutive
months of illness and cannot have predated the fatigue.

These symptoms are non-specific and variable in both
nature and severity over time. ... We recommend that research studies use
the SPHERE (discussed below) to query subjects (cases
and controls) about the occurrence, duration, and severity
of the 8 case defining symptoms and other potentially
accompanying symptoms

So by only asking about the last week they risk:
1. Falsely INCLUDING those who have temporary symptoms unrelated to CFS (nb CFS symptoms are pretty generic and can have many other causes)
2. Falsely EXCLUDING those who have had the symptoms frequently over the correct 6-month period but not in the last week.

This doesn't mean their 'International Criteria' (IC) cohort is completely unrelated to what would have been found with correct implementation, but it does suggest it might not be a very accurate categorisation, which makes any findings about this cohort less reliable.

Some more speculation
Around 60% of participants, all of whom are Oxford Criteria-diagnosed, also had an IC diagnosis. That seems high since the only published prevalence study using Oxford I know of (Wessely, of course) found a prevalence of 2.5%, while other studies and estimates based on CDC/IC criteria put the prevalence around 0.5%, 5x lower. So I would have expected IC to be well under half the rate of Oxford Criteria.

Also, we know that in the initial eligibility screening, there were 266 cases of clinician-diagnosed CFS that didn't meet Oxford Criteria (vs 2,147 that did), and so didn't make it into the Trial (about 12%). Quite possible many of these met CDC/IC criteria and if we add these back in, it looks like the prevalence rate for CDC/IC could be maybe 2/3 that of Oxford, and that looks too high to me.
 

Dolphin

Senior Member
Messages
17,567
I think Dolphin has already pointed out that PACE have appeared to diagnose 'International Criteria' CFS (2003 update of Fukuda) by simply asking participants if they have experienced the symptoms in the past week.

Here's how they should have done it, according to the 2003 paper:

So by only asking about the last week they risk:
1. Falsely INCLUDING those who have temporary symptoms unrelated to CFS (nb CFS symptoms are pretty generic and can have many other causes)
2. Falsely EXCLUDING those who have had the symptoms frequently over the correct 6-month period but not in the last week.

This doesn't mean their 'International Criteria' (IC) cohort is completely unrelated to what would have been found with correct implementation, but it does suggest it might not be a very accurate categorisation, which makes any findings about this cohort less reliable.

Some more speculation
Around 60% of participants, all of whom are Oxford Criteria-diagnosed, also had an IC diagnosis. That seems high since the only published prevalence study using Oxford I know of (Wessely, of course) found a prevalence of 2.5%, while other studies and estimates based on CDC/IC criteria put the prevalence around 0.5%, 5x lower. So I would have expected IC to be well under half the rate of Oxford Criteria.

Also, we know that in the initial eligibility screening, there were 266 cases of clinician-diagnosed CFS that didn't meet Oxford Criteria (vs 2,147 that did), and so didn't make it into the Trial (about 12%). Quite possible many of these met CDC/IC criteria and if we add these back in, it looks like the prevalence rate for CDC/IC could be maybe 2/3 that of Oxford, and that looks too high to me.
Good you're looking at this.
Actually, in those that did take part, 67% satisfied the International criteria (see Table 1). "As randomised" refers to the fact that they were trying to spread them out through the various groups but they did this on slightly inaccurate information, "Actual" is the correct line.
 

oceanblue

Guest
Messages
1,383
Location
UK
Actually, in those that did take part, 67% satisfied the International criteria (see Table 1). "As randomised" refers to the fact that they were trying to spread them out through the various groups but they did this on slightly inaccurate information, "Actual" is the correct line.

Good point, so the overlap between Oxford and International Criteria cohorts is huge, and does suggest that something is wrong. Particularly when we know they failed to implement the International Criteria according to the source they reference.
 

oceanblue

Guest
Messages
1,383
Location
UK
International Criteria vs Oxford Criteria results

The authors state that:
Participant subgroups meeting international criteria for
chronic fatigue syndrome, London criteria for myalgic
encephalomyelitis, and depressive disorder criteria did
not differ in the pattern of treatment eff ects (fi gure 2; all
pinteractions were non-signifi cant

This looked suspect to me as the graphs in figure 2, for SF36 at least, appeared to show that the difference between SMC and GET/CBT was not significant. So to look more closely, I estimated the data from the graph. Because the baseline figures for SMC are quite a bit higher than those for CBT/GET it's misleading just to look at the 52 weeks endpoint - and it does appear that the results for international criteria are broadly in line with those for Oxford Critiera.

International Criteria results
SF-36 data: 52 week mean - Baseline mean=net difference
CBT: 56.6-37.4=+19.2 (vs +18.9 for Oxford Criteria)
GET: 57.2-37.3=+19.9 (vs +20.9 for Oxford Criteria)

Difference compared with SMC are slightly lower, but only because the SMC net difference scores for London Criteria were lower than for Oxford Criteria.

Of course, whether or not the PACE definition of 'International Criteria' is meaningful is another question altogether.
 

Dolphin

Senior Member
Messages
17,567
The authors state that:

This looked suspect to me as the graphs in figure 2, for SF36 at least, appeared to show that the difference between SMC and GET/CBT was not significant. So to look more closely, I estimated the data from the graph. Because the baseline figures for SMC are quite a bit higher than those for CBT/GET it's misleading just to look at the 52 weeks endpoint - and it does appear that the results for international criteria are broadly in line with those for Oxford Critiera.

International Criteria results
SF-36 data: 52 week mean - Baseline mean=net difference
CBT: 56.6-37.4=+19.2 (vs +18.9 for Oxford Criteria)
GET: 57.2-37.3=+19.9 (vs +20.9 for Oxford Criteria)

Difference compared with SMC are slightly lower, but only because the SMC net difference scores for London Criteria were lower than for Oxford Criteria.

Of course, whether or not the PACE definition of 'International Criteria' is meaningful is another question altogether.
Maybe you could calculate the SMC. I calculated the difference in 2F would be less than 5.5 (4mm) and in 2G around 6.9.

Somebody pointed out the following to me:
When 95% CI arms cross that can mean the p-value calculated for the difference between means will be non-significant. For example, Fig. 2B shows no overlap of CI arms at 52 weeks if you compare SMC to GET or CBT so we can conclude there is a significant difference. However, Fig 2F shows clearly that SMC upper CI arms overlap with those of GET and CBT lower CI arms. A similar pattern is also seen in Fig 2C and Fig 2G. Taking these four Figs into account, it decreased fatigue is statistically significant but there might be no significant difference in physical functioning measured.

About 95% CI arm overlap: http://www.cscu.cornell.edu/news/statnews/stnews73.pdf
How to eyeball CIs in figures - interesting! : http://www.ncbi.nlm.nih.gov/pubmed/18991332
If one looks at 2F and 2G, one seems to have enough information to say SMC isn't different from CBT and GET.

If SMC was slightly higher in 2F and 2G to 2E, it could be sufficient to make the differences no longer significant.

One caveat is that these are unadjusted data but that may not matter. I do believe their overall single p-values* look at whether individual differences are still significant at each point.

*there are four time-points and four measures at each point, resulting in lots of comparisons that might be different