That's a very small sample size, so it would be difficult to reach any conclusion. But basically there are not 7 SNPs: there are two groups, Group A which contains 3 SNPs, and Group B which contains 4 SNPs. If someone has one +/+ SNP in group B, 99% of the time they have all the other +/+ group B SNPs as well.a) there were 24 people in the table. Only one had 7 and another one had 4 SNPs. The remainder did not approach this.
Because +/+ is more common for Group A and Group B, most people will have all 7 SNPs as +/+. Quite a few others will have 3 or 4 SNPs as +/+, depending on whether they have just Group A or Group B as +/+. Some will have -/- for all. A small few will have a different number, because while the SNPs are strongly linked, they aren't 100% linked.
Which study are we referring to? One big problem with looking for genetic correlation in diseases is that there are billions of SNPs. If enough SNPs are looked at in a study odds are extremely good that a "statistically significant" result will occur. But it might simply be random background noise - in line with the idea that 1000 monkeys typing for 1000 years might accidentally recreate the works of Shakespeare. The smaller number of patients in a study, the bigger the chance that results are a false positive.The authors were looking for subjects with strong symptoms as they were trying to prove an hypothesis. From these numbers surely the incidence of having 7 SNPs would be less than "billions"? Or am I missing something? I know that 24 is hardly an authoritative sampling size.
High linkage disequilibrium means that they are tightly linked. So if someone gets one +/+ in Group A or Group B, they usually get the rest of the +/+ SNPs for that group. When the prevalence rate looks different, there are usually a couple things happening: one is that there's almost always a different number of results for each SNP. Even though everyone in the study is tested, they often fail to get results for several of them on each SNP. Additionally, when the sample size is small, these differences then end up looking huge.b) what did the study mean that some of the SNPs showed a high linkage disequilibrium in CFS? Does this mean the SNPs are not linked? Looking at the table, and again aware of the sample size, These SNPs don't appear to be overly linked.
This is where dbSNP comes in. They have much bigger samples of the general population, so it's easier to see how tightly linked SNPs really are. The "MAF" (Minor Allele Frequency) near the top is useful as a rough guide, but if you go to the bottom of the page for an SNP, they have actual genotype frequency submitted by various groups. Many of those groups have submitted info from the same 50-100 people (though some have 2000+) for a ton of SNPs, so the genotype or allele percentages there can then be directly compared between two or more SNPs.
What usually happens in that one or more racial groups will show complete linkage disequilibrium, and one or two other groups show minor variations, but still a very high degree of linkage disequilibrium. So those SNPs in Group A or Group B can sometimes be inherited individually, but basically never are, especially for Europeans.
Allele prevalence is how many A's or G's, for example, are present for a single SNP in a group, with each person having two alleles. Genotype prevalence is how many people have AA, AG, or GG.c) do you happen to know how the " expected" allele becomes a fraction of the "variant"? It is driving me nuts that the expected is 15% across all races while the variant is 85%! I can't reconcile this.
Actual genotype prevalence is given at the bottom of the page for SNPs at dbSNP. But it can also be calculated based on the frequency of the alleles, which is useful if genotype isn't available, or is only available for small groups.
Basically you square the MAF (minor allele frequency) or major allele frequency to get the expected prevalence rate of being homozygous for the minor or major allele. And to get the expected prevalence rate of being heterozygous, you basically use the quadratic equation (1-MAF)*MAF*2.
So if you look at rs2918419 at dbSNP, you'll see that the MAF is .135 for C. This means that CC = .135 x .135 = .018225, or about 1.8%. We know the other allele is "T" because dbSNP lists the alleles as being A and G on the reverse strand, which means T and C if we're going to be sane about it And if the MAF for C is .135, the allele frequency for T is 1-.135 = .865. Thus the calculated expected prevalence is .865 x .865 = .748225, or about 75%. And AG should be (1-.135) x .135 x 2 = .23355, or about 23%.
I think what might be tripping you up is that the MAF is almost 15%, but you're homozygous for the major allele, not the minor. So the MAF is 13.5%, but you have the genotype which has about 75% prevalence. Additionally, if you look at the smaller samples at dbSNP, that prevalence can be as high as 87.5% - but the bigger European sample says 77% - so it really looks like randomness is having a fairly big impact on that smaller sample.
Last edited: