Creating SNP panels - Feedback on appearance?

kday · Aug 6, 2013

Here are my rare SNPs (click on thumbnail below). The first one is Pseudo Tay Sachs I think.

Bluebell · Aug 6, 2013

kday said:
Here are my rare SNPs (click on thumbnail below). The first one is Pseudo Tay Sachs I think.

Is that report from GEDMatch?

Valentijn · Aug 7, 2013

kday said:
Here are my rare SNPs (click on thumbnail below). The first one is Pseudo Tay Sachs I think.

Interesting. The rsID is rs138058578, and it does create a missense mutation (HEXA R249W).

The good news is that being heterozygous shouldn't be a problem, unless having kids with another heterozygous person. http://omim.org/entry/606869 describes the mutations on that gene which are known to cause problems.

Because compound heterozygous mutations (two heterozygous mutations at a locus on a gene, as described in https://en.wikipedia.org/wiki/Compound_heterozygosity) can also be problematic, you might want to take a closer look at the rest of your results for the HEXA gene.

Bluebell · Aug 7, 2013

Valentijn said:
Interesting. The rsID is rs138058578, and it does create a missense mutation (HEXA R249W).

I have GG on that (i4000440), is GG the non-risky allele? Edit: I now see that AA is the risky allele.

I couldn't find that RS number in dbSNP: http://www.ncbi.nlm.nih.gov/snp/?term=rs138058578&SITE=NcbiHome&submit=Go

I couldn't find it in 23andMe except for this: https://www.23andme.com/health/tay_sachs/techreport/

"23andMe also reports data for two HEXA mutations, R247W and R249W, known as "pseudodeficiency alleles." These mutations are considered normal variations because they do not actually affect the activity of the hexosaminidase A enzyme. They do, however, cause false positive results in blood tests that measure hexosaminidase A activity that are used to screen for Tay-Sachs disease. People with one copy of either the R247W or R249W mutation would be identified as carriers for Tay-Sachs disease by this type of screening test. People who have one of these mutations in one copy of the HEXA gene and a true Tay-Sachs mutation in the other copy of the gene would be identified as having Tay-Sachs disease, even though they are actually unaffected carriers. It is important to note, however, that these people still have a 50% chance of passing on the disease-causing mutation to a child."

Valentijn · Aug 7, 2013

Bluebell said:
I couldn't find that RS number in dbSNP: http://www.ncbi.nlm.nih.gov/snp/?term=rs138058578&SITE=NcbiHome&submit=Go

You have to search without the "rs" at the front

Valentijn · Aug 7, 2013

kday said:
FYI: GEDmatch has a much for friendly interface now. Just looked at it and it has changed since I last used it.

Well, I don't know what the old one was like, but the current one just gave me a headache

They also insist on getting my real name, and having my real name on my results file, which matches up with a real name from 23andMe. I'm not particularly paranoid about these sorts of things, but that is asking WAY too much.

Bluebell · Aug 8, 2013

Valentijn said:
You have to search without the "rs" at the front

Actually, I clicked on 23andMe's automatic linking from my result to the dbSNP site -- like I always do. Clicking on that link is what brought up the notice that it wasn't in the dbSNP database. So, roll your eyes at 23andMe!

Bluebell · Aug 8, 2013

Bluebell said:
Actually, I clicked on 23andMe's automatic linking from my result to the dbSNP site -- like I always do. Clicking on that link is what brought up the notice that it wasn't in the dbSNP database. So, roll your eyes at 23andMe!

Valley, I went back to the error message link and took out "rs" and pressed enter.

Now I've got this -- http://www.ncbi.nlm.nih.gov/snp/?term=138058578

How do I get from this page to the part of dbSNP that I am used to seeing, which shows the population percentages of the alleles?

Valentijn · Aug 8, 2013

Bluebell said:
How do I get from this page to the part of dbSNP that I am used to seeing, which shows the population percentages of the alleles?

Click on the rs number in the upper left area (next to "1.").

This one doesn't have the standard prevalence data for the minor allele at the top of the page, but they do have the data from a huge group at the bottom.

Bluebell · Aug 8, 2013

Valentijn said:
Click on the rs number in the upper left area (next to "1.").

This one doesn't have the standard prevalence data for the minor allele at the top of the page, but they do have the data from a huge group at the bottom.

My gosh, 1 in 1000!

nandixon · Aug 8, 2013

kday said:
Here are my rare SNPs (click on thumbnail below). The first one is Pseudo Tay Sachs I think.

Are your GEDmatch rare allele results the same as your results from Ian Logan's rare allele (Minor Allele) program? Assuming you've also done the latter. Were there any different SNPs found between the two? Thanks.

kday · Aug 8, 2013

I haven't tried Ian Logan's minor allele program.

nandixon · Aug 9, 2013

kday said:
I haven't tried Ian Logan's minor allele program.

Okay. I went ahead and ran GEDmatch's rare allele program on my 23andMe data. There's mostly overlap with the results from Ian Logan's program, but also a few new hits. The new hits turned out to be either not useful (at least in my case) or not actually rare, i.e., there were some frequency errors. Additionally, several of the gene names were outdated. Difficult to say whether it'd be worthwhile for other people to run. I did find out I apparently have a weird blood phenotype.

Valentijn · Aug 9, 2013

nandixon said:
The new hits turned out to be either not useful (at least in my case) or not actually rare, i.e., there were some frequency errors. Additionally, several of the gene names were outdated. Difficult to say whether it'd be worthwhile for other people to run. I did find out I apparently have a weird blood phenotype.

Yeah, when downloading the 1000genome prevalence data, I noticed that there's quite a few results which are inverted - that is, the "reference" allele is the less common one, and the "alternative" allele is the common one. Maybe GEDmatch is getting both the high and low prevalence alleles (less than 1% and greater than 99%), but not swapping the alleles to reflect the switch.

And I was able to run GEDmatch eventually, without my real name, by putting in a fake name in my data on the website, changing the name of my 23andMe file to match the fake name and the 23andMe naming format, re-zipping the file, and then uploading it.

One thing I do like about their site is that I can choose how many rare alleles to display, and select to just see homozygous ones, which are more likely to cause severe problems than heterozygous. But it's not user-friendly enough for brain-fogged ME patients, especially ones who aren't as computer-savvy, and the insistence on using your real name is extremely creepy.

In related news, the rare allele program that me and my fiance are working on now has seen some progress. We've now got a very bare text file of 20,062 alleles with a prevalence of 1% or less, and when zipped it's at 107 KB, which is hella small and should be easy to download, even with dialup. Mr Valentijn has started working on a program to compare the rare lists to the user's 23andMe results, which should also result in a very small program that can be downloaded and run locally.

We'll also be able to do another version on a website, preferably one which is very simple to use (to the tune of just making a couple mouse clicks). Additionally we should be able to do one which is offering more options, including looking at less-rare alleles (up to 10%), etc which could be especially useful in finding rare homozygous results.

And I'm still working on compiling a geneticgenie-style list of methylation SNPs nicely supported by research. I've got most of them listed already, but still need to find a few more and then make a list of the relevant research.

Bluebell · Aug 9, 2013

You are putting in lots of time, effort, and passion, V! It sounds like it will be a useful program for many, many people.

Valentijn · Aug 23, 2013

The downloadable rare gene program is done, aside from some minor tweaking. Basically it uses Java, but is a version which is compatible both with modern computers an well as the dinosaurs which some of us have

The program itself is about 2MB (for comparison, the 23andMe file is 23.6MB), and extremely user-friendly. Basically a box with three big buttons comes up. The top button is for selecting a rare gene database, but if there's one in the directory with the program, it'll select it automatically. If no database is selected, this button is orange, and turns green when the database has been selected (automatically or manually). This set up will make it easy for updated or alternate databases (prevalence of 5%, 10%, 2%, etc) to be downloaded and used.

The middle button is for selecting your 23andMe file. This starts off orange, and turns green when an appropriate file has been selected. It won't allow selection of a non-23andMe file.

The bottom button is for running the program. I think it starts off as yellow, then turn green when running. A progress bar appears while results are run (takes about 30 seconds), with each "hit" added to a tab-separated table as it is found. The buttons can't be hit while the program is running.

When it's done running, you have a tab-separated list of all variations, in order by chromosome and location (sample output is attached to this post). They are listed by 23andMe rsID and "i" number, and Rare allele, percentage (converted to percent instead of decimal number), and the users genotype are also listed. These results can be cut and pasted into a text (or other) file, and there's a button for selecting to convert into a .csv (comma-separated values, useful for opening with Excel or similar).

We also need to add a flag for homozygous results (in text form), and I think we might try to add in "rs" numbers where 23andMe has used "i" numbers, to make it easier to find additional info. We should probably add location back in to, so that anyone can sort them back into proper order if playing with Excel.

Insertion and deletions aren't included currently, for a few reasons: 1) the raw data for prevalence rates is pretty garbled, 2) most insertion/deletion SNPs don't have prevalence data anyhow, and related to this is 3) 23andMe has trouble detecting insertions and deletions reliably. This is probably something we'll look into more, but it would delay the project considerably if we attempted to get it working at this stage.

But basically ... it's a small download, it's easy to use, it's fast, it's easy to use, and thus far it seems very accurate. For the 1% and under bunch, it gives me a list of 139 rare SNPs. I've looked into the first 70 and confirmed that they are indeed rare according to other sources, and it's reporting the rare alleles and my genotype accurately.

I still need to compare it to similar programs on the internet, to see if there's something they're picking up which we're missing - this is unlikely, since we're generating more results, possibly due to using the most recent prevalence data from 1000genomes, or due to cutting off at 1% instead of under 1%. Cutting it off 1% does keep the list much smaller, but I think that also creates an artificial distinction where there shouldn't be one: an SPN with a prevalence of 1.0% is very nearly as relevant as an SNP with a prevalence of 0.99%, so it doesn't make much sense to flag one and ignore the other.

This problem disappears after the 1% level, as after that everything is rounded to a whole percentage. So you only get decimal points under 1%, but otherwise everything is round to 2%, 4%, etc. The downside is that you're getting a lot more SNPs: in the neighborhood of 139 instead of 53. The upside is that you're getting more homozygous results, which are the ones most likely to cause major malfunctions. In fact, the 10% cut off is likely to produce a huge list, but will pick up on homozygous results that are essentially in the 1% or below range for genotype, yet would otherwise be missed due to each minor allele being too common.

roxie60 · Aug 23, 2013

Valentijn maybe my brain is just sleepy but where is the link to the program to test it out?? Thanks for all the effort should be interesting.

Valentijn · Aug 23, 2013

roxie60 said:
Valentijn maybe my brain is just sleepy but where is the link to the program to test it out?? Thanks for all the effort should be interesting.

It's not up for testing yet - still doing some final tweaking. It's also a downloadable program, not the version usable on a website.

roxie60 · Aug 23, 2013

Not a problem. In my former healthy life I am/was a systems programmer so familar with computers even if my brain has appeared to atrophied the last couple of years..... the big iron, the litttle computers I'm just an informed layman

Creating SNP panels - Feedback on appearance?

Senior Member

Attachments

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Attachments

Senior Member

Senior Member

Senior Member