Could you explain this further? In an old post of yours, I believe you said when analyzemygenes runs with the 1% database, it is finding mutations that less than 1 in 10000 people have. I may have interpreted your post wrong. Why doesn't this 1% figure mean 1 in 100 people?
The 1% file is looking at SNPs with a prevalence of 1% (0.01) or less. Homozygous results returned from that database would have a prevalence of 0.01 squared (0.01 x 0.01). That works out to 0.01% (0.0001), or 1 in 10,000.
If allele prevalence is 10%, then the homozygous genotype prevalence is 0.1 x 0.1 = 0.01, or 1%. So "rare" mutations listed above have a much higher allele prevalence than the 1% in the rare gene analysis program - going up to 31.6% allele rate for homozygous mutations to be at 10% or less, and going up to 5.3% allele rate for heterozygous mutations to be at 10% or less.
So this isn't looking at super duper rare mutations (that's what the rare genes analysis program does), but rather looking for relatively rare genotypes shared by ME/CFS patients when compared to controls. That way trends across a gene (or multiple genes with a similar function) can be spotted, instead of just identical very rare SNPs shared by multiple patients.
And I was looking at dbSNP and (if I am not interpreting it wrong) they said the frequency number they post is the middle number, not the lowest number. So is analyzemygenes also using the middle number and not lowest?
Both dbSNP and the rare genes analysis program use 1000 Genomes as the data source for minor allele frequency. Hence they should be the same, unless dbSNP hasn't gotten the most recent update from 1000 Genomes.
And further can someone confirm that middle number means heterozygous mutation frequency? I'm trying to piece this together. If we are examining a specific SNP of a gene, and 1 in 4000 people have a heterozygous mutation and 1 in 10000 people have the homozygous mutation, then it is 1% for homozygous, but analyzemygenes will put 2.5%.
Analyzymygenes is indicating minor allele frequency (MAF). Genotype frequency can be calculated from the MAF:
Homozygous = MAF x MAF
Heterozygous = 2 x (1 - MAF) x MAF
Edited to add: I am further confused because of mutations which are only in a certain ethic group. Say 2% of that ethnic group has a mutation but 0% of caucasians and 0% of every other ethnic group. I believe the number reported by these sites will say 2% which doesn't take into account that the first ethnic group has a low population.
The frequencies given are across all populations. Ethnic rates are available, but much less relevant, in my opinion. Genes function as they function, regardless of ethnicity - there may be some variations due to interactions with other SNPs which are more or less common in that group, but it's still coming down to the genes. Is an SNP with 25% prevalence in general, but 5% prevalence in Caucasians going to be relevant to causing a fairly rare disease? Probably not, hence me not giving a damn if it's rare in a tiny and insular sample of people which isn't even properly representative of Europeans
Using data from a specific ethnicity would also be arbitrarily exclusionary toward non-Europeans, and I far prefer an inclusive approach.