I am nowhere near up to speed on the subject of GcMAF in general, so please forgive and correct any inaccuracies in what follows...
ETA: Garcia has now done so...
My understanding is: in association with the GcMAF trial of Dr de Meirleir's patients, a genetic analysis was carried out, the main headline of which is that the 20% of patients who are failing to respond to GcMAF all have the same allele on one of the tests.
ETA: What I had misunderstood was the context of the data, which is nothing to do with GcMAF or Dr De Meirleir; instead the spreadsheet is a 'proof of concept' for an ongoing project (as Garcia explains below)...so I've removed the sections of this post that were based on that fals premise...the rest of my analysis seems sound though...
When I saw the dataset of these genetic tests, I got really excited at the prospect of mining this raw data for clues - and I still am: that's what this thread is for: analysis of those results. Unfortunately, the work I've done so far shows quite clearly that at least one of the six statistically significant differences found between ME patients and controls is spurious.
Background on genetics
First: I've done a lot of reading around the subject in the last day or two, and there's a little bit of background on the science I'd like to present, to help those (like me) with an incomplete background in biology, to understand some of the basics of the subject...again please do correct any inaccuracies in what follows...
The genetic analysis is a study of Single Nucleotide Polymorphisms (SNPs). SNPs ("snips") are differences of a single element in the genetic code - so for example a subsection of genetic code like "...CGTCAGCG..." might appear in 20% of the population as "...CGTCAACG..." - some people have "A" at that position, others have "G".
Since we have two copies of the code, it's possible to have one copy of each of these variants (which are called alleles). Although it's my rough guess that it may be questionable in many cases to refer to either of these alleles as a "mutation", ancestral genetic analysis does allow one of these alleles to be identified as the "wild type" - the 'typical' form - and the other allele as a "mutation". This distinction is referenced using '+' and '-', and the normal convention is that '+' denotes the 'wild type' and '-' denotes the mutation.
So: since we have two copies of the code, each of us may either have two copies of the wild type, two copies of the mutation, or one of each, denoted as +/+, +/-, or -/-.
There has already been some confusion in the de Meirleir data, in that the positives and negatives are the wrong way round for one of the SNPs studied - and I believe it was Garcia who did some brilliant work in identifying and analysing this issue, and notified the researchers of that detail which caused quite a bit of confusion.
One last bit of preamble: in my reading about SNPs, I read that the average adult has about 1 million SNPs in their DNA. SNPs, then, are not generally to be seen as serious genetic deficiencies...whereas double nucleotide polymorphisms are liable to cause much more significant problems.
Understanding the spreadsheet
Next I'd like to offer a few hints for understanding the spreadsheet of data from the genetic analysis, which I found linked on one of the GcMAF threads. Here's the spreadsheet:
https://spreadsheets.google.com/ccc?key=0Ar76dNWyEQLIdEpJblFOTnU5NFVxRy1LLUFCN0dSOXc&hl=en
Across the top of the spreadsheet, the headers reference the various SNPs studied. The genes in which the SNP occurs are referenced by code names: ACE, CBS, COMT etc...and within each gene, a few known SNPs are studied, one in each column - referenced by the code number for the SNP, such as rs1799752. The next line in the spreadsheet shows, for each SNP, which allele is considered to be the '+' and which is the '-'. So: +T/-C means that the variation with a "T" is considered to be '+' (wild type) and the variation with a "C" is considered to be '-' (mutation).
Reading down the spreadsheet, next follows the full results for all 49 patients studied. And then, we get to the statistical analysis...
Three rows show the percentages of each of the 3 types (+/+, +/-, -/-) in CFS patients, then the next 3 rows show the percentages for the controls. After that follow the raw numbers for this data: the total number (n) of individuals studied, and the numbers for each of the 3 types.
Finally, the p-values follow, representing the percentage chance that the differences observed between the ME patients and the controls could be explained by random chance. Some of those percentages are so low that they round to zero in the spreadsheet.
The colour coding of the spreadsheet helps to separate the genes studied, but also indicates the strengths of association observed - the headers and p-values for each column are coloured black for those SNPs where the results are statistically significant (the lowest p-values).
So it's the columns headed in black that we're most interested in...
AHCY-01 variance
I've only studied one of the SNPs so far: the data that jumped out at me as being most interesting, based on the numbers alone...
For the AHCY-01 gene, a p-value of 0.0172 is observed: about a 2% chance that this is just random.
The percentages work out as follows:
Patients: 78.8% +/+ 21.2% +/- 0% -/-
Controls: 54.6% +/+ 33.0% +/- 12.4% -/-
What's very striking is that not one of the patients carries the -/- variation, which is expected in 12.4% of the population. This becomes even more striking when you look at the rest of the AHCY genes, which have less statistical significance, but still show variance with p-values of 0.06 and 0.1 (rounded) - when you look at the numbers for those, again no patients have the -/- form, where rates of about 12% are expected.
This sort of correlation seems really exciting. When I saw it, I speculated that the -/- form is something that ME/CFS patients just don't have...it could be something that defines the population studied, that we don't have this allele...so maybe, this means that this allele protects against CFS?
Sadly, no.
My next step was to google my way to the database references for these polymorphisms - you can google 'rs1799752' and find a data sheet for that SNP. So you can see what proteins are encoded by that section of genetic code, what the gene itself does, what disease associations are known for the SNP variants, etc etc. Note that in this case, the SNPs studied relate to aspects of the methylation process, because that, of course, is what the researchers were exploring.
And my googling led me to a most disappointing explanation for this particular variation in the data...
AHCY-01 variance is explained by racial profile
Finally...the exciting, yet at the same time disappointing discovery that I made last night...
I came across this page, which I'm afraid seems to me to kill this apparent genetic variance of ME patients stone dead:
http://www.ncbi.nlm.nih.gov/SNP/snp_retrieve.cgi?subsnp_id=ss48292451
The significant data is at the bottom of the page, headed "Population Allele Frequency Batch", where one can see that this is clearly the same data set that was used for the control data in the case of AHCY-01.
The first batch under that heading clearly shows that the numbers are the same as those in the spreadsheet, where n=97 (no of chromosomes sampled 194, 2 for each subject), and the percentages match the spreadsheet exactly: 54.6%, 33%, 12.4% - the spreadsheet data comes from this P1 batch, then:
Handle|PopulationID: SNP500CANCER|P1
No. of Chromosomes Sampled: 194
Allele: A=0.711/G=0.289
Genotype: AG=0.33/AA=0.546/GG=0.124
The next set of batches listed, it turns out, are subsets of the first batch, grouped by ethnicity: Cauc1 is "caucasian", Afr1 is "african african american", Hisp1 is "hispanic" and Pac1 is "pacific rim":
Handle|PopulationID: SNP500CANCER|CAUC1
No. of Chromosomes Sampled: 58
Allele: A=0.793/G=0.207
Genotype: AG=0.345/AA=0.621/GG=0.034
Handle|PopulationID: SNP500CANCER|AFR1
No. of Chromosomes Sampled: 46
Allele: A=0.391/G=0.609
Genotype: AG=0.522/AA=0.13/GG=0.348
Handle|PopulationID: SNP500CANCER|HISP1
No. of Chromosomes Sampled: 42
Allele: A=0.833/G=0.167
Genotype: AG=0.143/AA=0.762/GG=0.095
Handle|PopulationID: SNP500CANCER|PAC1
No. of Chromosomes Sampled: 48
Allele: A=0.813/G=0.187
Genotype: AG=0.292/AA=0.667/GG=0.041
And so, at last, I come to the point...
In the caucasian batch, the distribution of this AHCY-01 SNP is very similar to the distribution in the ME patients studied. Whereas in the AFR1 group the prevalence of the GG genotype is 34.8%, in the CAUC1 group it's just 3.4%. The spreadsheet is comparing with the overall level of GG - 12.4% - but the racial profile from the more detailed data for that batch tells us: GG is an african gene.
That 3.4% in the CAUC1 group is 3.4% of the 29 caucasians studied: ie. 1/29.
I think it's pretty obvious from the above data that the true explanation for the variation in the AHCY-01 gene found by this study lies in geographic/ethnic variance.
What does this mean for the study as a whole?
I can't begin to tell you how disappointed I am that what I've found during the course of my investigation casts doubt on the results of the study as a whole, because this is absolutely not what I was hoping to do! But this analysis does highlight a serious flaw in the data in that spreadsheet. The comparisons the spreadsheet makes between the patients and the controls are not comparisons with matched controls, but instead they are comparisons with overall data from the US. Thus it now seems to me that all of the associations found in the study are suspect: perhaps they are all explained by geographical differences, differences of ethnicity, etc.
So the top priority seems to me to be obtaining better matched control data...and I understand that those involved with this work are working on that issue...
ETA: Deleted the rest of this post, since it was based on the false premise that this was related to the GcMAf trial.
ETA: Garcia has now done so...
My understanding is: in association with the GcMAF trial of Dr de Meirleir's patients, a genetic analysis was carried out, the main headline of which is that the 20% of patients who are failing to respond to GcMAF all have the same allele on one of the tests.
ETA: What I had misunderstood was the context of the data, which is nothing to do with GcMAF or Dr De Meirleir; instead the spreadsheet is a 'proof of concept' for an ongoing project (as Garcia explains below)...so I've removed the sections of this post that were based on that fals premise...the rest of my analysis seems sound though...
When I saw the dataset of these genetic tests, I got really excited at the prospect of mining this raw data for clues - and I still am: that's what this thread is for: analysis of those results. Unfortunately, the work I've done so far shows quite clearly that at least one of the six statistically significant differences found between ME patients and controls is spurious.
Background on genetics
First: I've done a lot of reading around the subject in the last day or two, and there's a little bit of background on the science I'd like to present, to help those (like me) with an incomplete background in biology, to understand some of the basics of the subject...again please do correct any inaccuracies in what follows...
The genetic analysis is a study of Single Nucleotide Polymorphisms (SNPs). SNPs ("snips") are differences of a single element in the genetic code - so for example a subsection of genetic code like "...CGTCAGCG..." might appear in 20% of the population as "...CGTCAACG..." - some people have "A" at that position, others have "G".
Since we have two copies of the code, it's possible to have one copy of each of these variants (which are called alleles). Although it's my rough guess that it may be questionable in many cases to refer to either of these alleles as a "mutation", ancestral genetic analysis does allow one of these alleles to be identified as the "wild type" - the 'typical' form - and the other allele as a "mutation". This distinction is referenced using '+' and '-', and the normal convention is that '+' denotes the 'wild type' and '-' denotes the mutation.
So: since we have two copies of the code, each of us may either have two copies of the wild type, two copies of the mutation, or one of each, denoted as +/+, +/-, or -/-.
There has already been some confusion in the de Meirleir data, in that the positives and negatives are the wrong way round for one of the SNPs studied - and I believe it was Garcia who did some brilliant work in identifying and analysing this issue, and notified the researchers of that detail which caused quite a bit of confusion.
One last bit of preamble: in my reading about SNPs, I read that the average adult has about 1 million SNPs in their DNA. SNPs, then, are not generally to be seen as serious genetic deficiencies...whereas double nucleotide polymorphisms are liable to cause much more significant problems.
Understanding the spreadsheet
Next I'd like to offer a few hints for understanding the spreadsheet of data from the genetic analysis, which I found linked on one of the GcMAF threads. Here's the spreadsheet:
https://spreadsheets.google.com/ccc?key=0Ar76dNWyEQLIdEpJblFOTnU5NFVxRy1LLUFCN0dSOXc&hl=en
Across the top of the spreadsheet, the headers reference the various SNPs studied. The genes in which the SNP occurs are referenced by code names: ACE, CBS, COMT etc...and within each gene, a few known SNPs are studied, one in each column - referenced by the code number for the SNP, such as rs1799752. The next line in the spreadsheet shows, for each SNP, which allele is considered to be the '+' and which is the '-'. So: +T/-C means that the variation with a "T" is considered to be '+' (wild type) and the variation with a "C" is considered to be '-' (mutation).
Reading down the spreadsheet, next follows the full results for all 49 patients studied. And then, we get to the statistical analysis...
Three rows show the percentages of each of the 3 types (+/+, +/-, -/-) in CFS patients, then the next 3 rows show the percentages for the controls. After that follow the raw numbers for this data: the total number (n) of individuals studied, and the numbers for each of the 3 types.
Finally, the p-values follow, representing the percentage chance that the differences observed between the ME patients and the controls could be explained by random chance. Some of those percentages are so low that they round to zero in the spreadsheet.
The colour coding of the spreadsheet helps to separate the genes studied, but also indicates the strengths of association observed - the headers and p-values for each column are coloured black for those SNPs where the results are statistically significant (the lowest p-values).
So it's the columns headed in black that we're most interested in...
AHCY-01 variance
I've only studied one of the SNPs so far: the data that jumped out at me as being most interesting, based on the numbers alone...
For the AHCY-01 gene, a p-value of 0.0172 is observed: about a 2% chance that this is just random.
The percentages work out as follows:
Patients: 78.8% +/+ 21.2% +/- 0% -/-
Controls: 54.6% +/+ 33.0% +/- 12.4% -/-
What's very striking is that not one of the patients carries the -/- variation, which is expected in 12.4% of the population. This becomes even more striking when you look at the rest of the AHCY genes, which have less statistical significance, but still show variance with p-values of 0.06 and 0.1 (rounded) - when you look at the numbers for those, again no patients have the -/- form, where rates of about 12% are expected.
This sort of correlation seems really exciting. When I saw it, I speculated that the -/- form is something that ME/CFS patients just don't have...it could be something that defines the population studied, that we don't have this allele...so maybe, this means that this allele protects against CFS?
Sadly, no.
My next step was to google my way to the database references for these polymorphisms - you can google 'rs1799752' and find a data sheet for that SNP. So you can see what proteins are encoded by that section of genetic code, what the gene itself does, what disease associations are known for the SNP variants, etc etc. Note that in this case, the SNPs studied relate to aspects of the methylation process, because that, of course, is what the researchers were exploring.
And my googling led me to a most disappointing explanation for this particular variation in the data...
AHCY-01 variance is explained by racial profile
Finally...the exciting, yet at the same time disappointing discovery that I made last night...
I came across this page, which I'm afraid seems to me to kill this apparent genetic variance of ME patients stone dead:
http://www.ncbi.nlm.nih.gov/SNP/snp_retrieve.cgi?subsnp_id=ss48292451
The significant data is at the bottom of the page, headed "Population Allele Frequency Batch", where one can see that this is clearly the same data set that was used for the control data in the case of AHCY-01.
The first batch under that heading clearly shows that the numbers are the same as those in the spreadsheet, where n=97 (no of chromosomes sampled 194, 2 for each subject), and the percentages match the spreadsheet exactly: 54.6%, 33%, 12.4% - the spreadsheet data comes from this P1 batch, then:
Handle|PopulationID: SNP500CANCER|P1
No. of Chromosomes Sampled: 194
Allele: A=0.711/G=0.289
Genotype: AG=0.33/AA=0.546/GG=0.124
The next set of batches listed, it turns out, are subsets of the first batch, grouped by ethnicity: Cauc1 is "caucasian", Afr1 is "african african american", Hisp1 is "hispanic" and Pac1 is "pacific rim":
Handle|PopulationID: SNP500CANCER|CAUC1
No. of Chromosomes Sampled: 58
Allele: A=0.793/G=0.207
Genotype: AG=0.345/AA=0.621/GG=0.034
Handle|PopulationID: SNP500CANCER|AFR1
No. of Chromosomes Sampled: 46
Allele: A=0.391/G=0.609
Genotype: AG=0.522/AA=0.13/GG=0.348
Handle|PopulationID: SNP500CANCER|HISP1
No. of Chromosomes Sampled: 42
Allele: A=0.833/G=0.167
Genotype: AG=0.143/AA=0.762/GG=0.095
Handle|PopulationID: SNP500CANCER|PAC1
No. of Chromosomes Sampled: 48
Allele: A=0.813/G=0.187
Genotype: AG=0.292/AA=0.667/GG=0.041
And so, at last, I come to the point...

In the caucasian batch, the distribution of this AHCY-01 SNP is very similar to the distribution in the ME patients studied. Whereas in the AFR1 group the prevalence of the GG genotype is 34.8%, in the CAUC1 group it's just 3.4%. The spreadsheet is comparing with the overall level of GG - 12.4% - but the racial profile from the more detailed data for that batch tells us: GG is an african gene.
That 3.4% in the CAUC1 group is 3.4% of the 29 caucasians studied: ie. 1/29.
I think it's pretty obvious from the above data that the true explanation for the variation in the AHCY-01 gene found by this study lies in geographic/ethnic variance.
What does this mean for the study as a whole?
I can't begin to tell you how disappointed I am that what I've found during the course of my investigation casts doubt on the results of the study as a whole, because this is absolutely not what I was hoping to do! But this analysis does highlight a serious flaw in the data in that spreadsheet. The comparisons the spreadsheet makes between the patients and the controls are not comparisons with matched controls, but instead they are comparisons with overall data from the US. Thus it now seems to me that all of the associations found in the study are suspect: perhaps they are all explained by geographical differences, differences of ethnicity, etc.
So the top priority seems to me to be obtaining better matched control data...and I understand that those involved with this work are working on that issue...
ETA: Deleted the rest of this post, since it was based on the false premise that this was related to the GcMAf trial.