Having spent a good deal of time pouring over this study for my film, I thought I would try to explain my understanding of its rationale and promise.
First, there is no way to scan an individual spinal fluid sample for all of its proteins. 95% of the sample consists of just 14 proteins and their abundant "noise" masks the "signal" of the majority of the remaining proteins. Only about 500 proteins were found among all of the individual samples (CFS / nPTLS / Controls). Pooling the samples and then depleting them of those 14 abundant proteins resulted in the detection of a total of almost
4350 additional, less abundant proteins. When the three pools were compared, distinct differences were seen. One of those differences was the set of 738 proteins found exclusively in the CFS pool.
But since these are pooled samples, how can we tell which, if any, of these unique proteins are present in a significant number of individual CFS patients rather than having been contributed by a few "rogue" samples?
There is, in fact, a way to detect individual proteins in individual samples,
if you know what particular protein you are looking for. The problem is that the human body can produce some 2 million different kinds of proteins. You can't go looking for biomarkers by taking random shots at 2 million targets. However, the comparison of pooled samples has reduced the 2 million targets to just 738.
It is possible that all 738 proteins unique to the CFS pool belong to only a small number of "rogue" samples (or perhaps to just one very unusual sample). The researchers think this is "unlikely" in light of the individual protein abundance results, which were able to distinguish the CFS group from the nPTLS group with a p value < 0.01. [The pooled study was not given a p value in the body of the paper, although I must admit that the abstract might make one think so.] The authors do not discuss how the individual results inform their confidence in the pooled results, but I assume that it is more than just a "leap of faith."
[One possible train of inductive reasoning might be that if the pattern of abundance differences seen among the individual samples (the p < 0.01 results) were also to be found among some of the less abundant proteins in the pools, then "low relative abundance" could translate to "no abundance" in one pool and "some abundance" in another, thus producing "unique" proteins in a particular pool. That's just a guess of mine, however.]
At any rate, study replication and further disease specific analysis can narrow the target list even further, focusing in on the most promising potential biomarkers.
At that point, individual samples can be searched for candidate biomarkers using tools such as
Selective Reaction Monitoring Mass Spectrometry [as mentioned in the 2011 paper]. It will then be possible to say if a candidate biomarker is prevalent among CFS patients, or if it has been contributed by only a small number of "rogues." All of this, however, relies on first establishing what those candidate biomarkers might be. That's what the 2011 pool comparison results contribute.
These initial results are just the first part of a more lengthy biomarker discovery process. The steps of that process were outlined in the earlier 2010 paper that established the "Normal Human Cerebrospinal Proteome." That data was used as the control pool in the 2011 paper.
Here is a thumbnail of that process:
Step 1 -
Pooled sample comparison study.
Step 2 -
Select candidate proteins for examination in individual samples.
Step 3 -
Analysis of individual samples for candidate proteins.
This step checks for "rogue" samples that might have disproportionately contributed to the patient pool.
Step 4 -
Verification of independent samples of the same disease.
See if the results hold up in another cohort of CFS patients.
Step 5 -
Final validation.
Confirmation testing run on a larger number of patients and controls using assays targeted to the candidate protein.
The 2011 paper represents Step 1. I would assume that the researches have already started on Step 2. The potential implications of Step 3 are huge*, but we will have to wait to see if that potential is realized.
More detail on all of this can be found in the two papers, particularly in their discussion sections:
2010 Paper:
Establishing the Proteome of Normal Human Cerebrospinal Fluid
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010980
2011 Paper:
Distinct Cerebrospinal Fluid Proteomes Differentiate Post-Treatment Lyme Disease from Chronic Fatigue Syndrome
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0017287
Looked at one way, pooling the samples might seem like a crude methodology, producing 738 candidates proteins of uncertain value. Looked at another way, the pooled study had the precision to single out 738 proteins from 2 million possibilities, narrowing the field of biomarker candidates to just 0.037% of its original size.
The ultimate value of this nascent result remains uncertain, but, as Benjamin Franklin once said when asked "of what use" were the early French experiments in manned balloon flight... "Of what use is a newborn baby?"
In other words, let's see what comes of this.
SuddenOnset
* In this video of the recent P2P Workshop meeting, you can see Dr. Natelson discussing the need to move the discovery process on to Step 3 at 1:02:20.
http://videocast.nih.gov/summary.asp?Live=14727&bhcp=1
P.S. Thanks
@RustyJ for your blog about my video It is much appreciated! This project was indeed a good deal of work, taking about 6 months to complete.