Bob
Senior Member
- Messages
- 16,455
- Location
- England (south coast)
This is something I worked on some time ago, but never got around to posting it.
I might have made a mistake or two, as I'm going from memory, and I'm a novice statistician.
I'm hoping that Graham, or any other statisticians, will pick up any errors.
Usually, a 'normal range' (more appropriately known as a 'reference range') is calculated by using +/- 2 SD from the mean of a well defined population, as per this helpful Wikipedia page:
http://en.wikipedia.org/wiki/Reference_range
A 'normal range' analysis is intended to get rid of the top and bottom outliers of a well defined population. It cuts off the top and bottom 2.5% of values (5% in total), leaving the middle 95% as an indication of 'normal' values for a well defined population. For example, it could be used to describe the 'normal' walking speeds for healthy women of the same height.
It isn't appropriate to use this calculation (i.e. to use 'standard deviations') with SF-36 PF normative data because the data does not have a 'normal distribution' (i.e. it has a skewed distribution.) Using standard deviations for data that isn't normally distributed just doesn't give predictable or useful results, leading to a meaningless analysis. This is well known to all statisticians. (Except, apparently those working at the Lancet.)
The PACE Trial employed what seems to be a very unusual methodology, and decided to calculate +/- 1 SD of the general population (instead of +/- 2 SD of a well defined group), to give the 'normal range'.
So, in theory, this would cut off the top and bottom 16% (32% in total) for the general population, so it would include 68% of the population, if the data values were normally distributed.
I'm not sure what this methodology is intended to demonstrate, or what evidence it is based on, as I can't find this methodology described in any scientific literature.
It isn't appropriate to use this calculation for SF-36 PF scores, because the data isn't normally distributed. So, before we even do the calculation, we know it's going to give us meaningless results, as it proved to do.
If we assume that the 'normal range' in the PACE Trial was intended to cut off the bottom 16% of values of the general population (as +/- 1 SD is always assumed to do), then we need to know the 16th percentile, to know the statistically appropriate threshold for their inappropriate normal range analysis.
But in any case, their analysis is meaningless for all the other reasons we know about. I'm just exploring the issue.
A common methodology might cut off the bottom 2.5% of values of the healthy population, to define the 'normal range'. But there are even problems here, because the 'healthy' population is not a very well defined group.
I might have made a mistake or two, as I'm going from memory, and I'm a novice statistician.
I'm hoping that Graham, or any other statisticians, will pick up any errors.
Usually, a 'normal range' (more appropriately known as a 'reference range') is calculated by using +/- 2 SD from the mean of a well defined population, as per this helpful Wikipedia page:
http://en.wikipedia.org/wiki/Reference_range
A 'normal range' analysis is intended to get rid of the top and bottom outliers of a well defined population. It cuts off the top and bottom 2.5% of values (5% in total), leaving the middle 95% as an indication of 'normal' values for a well defined population. For example, it could be used to describe the 'normal' walking speeds for healthy women of the same height.
It isn't appropriate to use this calculation (i.e. to use 'standard deviations') with SF-36 PF normative data because the data does not have a 'normal distribution' (i.e. it has a skewed distribution.) Using standard deviations for data that isn't normally distributed just doesn't give predictable or useful results, leading to a meaningless analysis. This is well known to all statisticians. (Except, apparently those working at the Lancet.)
The PACE Trial employed what seems to be a very unusual methodology, and decided to calculate +/- 1 SD of the general population (instead of +/- 2 SD of a well defined group), to give the 'normal range'.
So, in theory, this would cut off the top and bottom 16% (32% in total) for the general population, so it would include 68% of the population, if the data values were normally distributed.
I'm not sure what this methodology is intended to demonstrate, or what evidence it is based on, as I can't find this methodology described in any scientific literature.
It isn't appropriate to use this calculation for SF-36 PF scores, because the data isn't normally distributed. So, before we even do the calculation, we know it's going to give us meaningless results, as it proved to do.
If we assume that the 'normal range' in the PACE Trial was intended to cut off the bottom 16% of values of the general population (as +/- 1 SD is always assumed to do), then we need to know the 16th percentile, to know the statistically appropriate threshold for their inappropriate normal range analysis.
But in any case, their analysis is meaningless for all the other reasons we know about. I'm just exploring the issue.
A common methodology might cut off the bottom 2.5% of values of the healthy population, to define the 'normal range'. But there are even problems here, because the 'healthy' population is not a very well defined group.