Big Data App to Explore Genomes for Clinical Relevance, Rare Variants, Drug Response, etc (Free)

Brandit · Oct 21, 2021

I'm using an older txt file but I should probably download a new one to see if there's any differences.

Heron N · Nov 25, 2022

SWAlexander said:
Orignaly I was V3 chip now V5.
V3 revealed much more. Example: gs224: https://rarediseases.org/rare-diseases/tetrahydrobiopterin-deficiency/

Have you read all the pubmed articles on this. It makes for interesting reading. Ive got the same It didnt turn up in my first 23& me test but it turned up in my ancestry that I did 6 months later and in my full genome which I got the results for in June 2022. I hadnt put my ancestry results through Promethease ( Doh!) I went back and checked and there it was staring me in the face though having done a full genome Im thankful that it turned up and I was able to get more answers.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251883/ I think you will find this one most illuminatory in why we with this genetic problem have such horrible sleep disturbances

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023510/#pone.0249608.ref007 and this one in regards to treatment options.

There are a myriad of articles. Good luck

Heron N · Nov 25, 2022

SWAlexander said:
"I think if we really want to know something we're going to have to get a full genome does anyone know what the cost is now?"
Even if you get a full genome, markers with "I" have no publication. Meaning - research has not complete answers yet, not (peer reviewed).

I did my full genome for $499 with lifetime support from Nebula but they apparently are not the best organisation to do it through ( that was $299 for the full genome and $200 for the support package. I bought that in July 2021 in a sale they had. ( half price) I didnt send the test off until January in 2022 and got the results by June 4th 2022. So in total it took me a year but Ive been busy since figuring out things from it including Gs224 like one other person here.

Heron N · Nov 26, 2022

kday said:
I am an ME/CFS patient and established member of this forum. I built a service called HGPAT, which stands for Human Genome Pathogenic Allele Tool. It is ready to be tested right now! I am looking for a better name, so new name suggestions are very welcome. This is a research tool so both citizen scientists and established scientists/researchers can make sense of raw genetic data when it comes to disease and risk factors. It's optimized for Desktop/Laptop/Tablet/Mobile, but I suggest Desktop/Laptop/Tablet to look at genome files as the app is very data heavy.

It's a powerful genomics tool so it may have a learning curve, but is extremely easy to use and has a UI that is meant for humans.

Just upload your 23andMe or AncestryDNA raw data here:
➡️ https://hgpat.tinybox.io

This is not the permanent domain. This is a temporary domain for user testing. The web application will moved when it's ready and I will update the link when it happens.

The app shows rare variants <1% Minor Allele Frequency (MAF), uncommon variants <5% MAF, other variants that the algorithm thinks could be relevant, and lists genetic conditions that are reviewed by an NCBI assigned Expert Panel as well as variants that are in the Genetic Testing Registry (GTR).

View attachment 32658

The app compares all uploaded data to ClinVar, which includes almost 500,000 submitted variants. For each variant, there are many relevant links to third party genetic websites for research. All associated diseases from ClinVar are listed for each variant as well as descriptions of diseases when you click on them. It doesn't tell you what you definitively have and definitely don't have like Promethease. You are the interpreter. All data is pulled by databases and "curated" by algorithms instead of pulled from a hand-curated Wiki. The information, including the written summaries and written predictions for each variant are completely automated. It assists you in research, and in some ways, could be considered more powerful than Promethease for disease/variant research and discovery. It uses Big Data techniques to predict relevance of SNPs and tells you whether or not it thinks the variant will have negative impact on enzyme function (no matter if it's classfied as pathogenic, benign, etc). The app utilizes a database called CADD that use machine learning and other techniques to rank how deleterious variants are. Very cool stuff.

It currently works with 23andMe and AncestryDNA. It also works with Whole Genome Sequencing (WGS/WES) files, but I don't currently have the servers configured to handle this much data. It's actually extremely efficient at Whole Genome Data processing. The biggest problem is that I currently don't have an uploader installed to send the data in chunks as you can be uploading upwards towards a gigabyte of data. If you want to process your whole genome data, send me a message!

It takes less than a minute to compute your 23andMe and Ancestry raw data and less than two minutes to process WGS/WES data after upload. However, support for WGS/WES is not currently configured, so please don't try as it will not work.

As some of you may know, I created Genetic Genie years ago. I've been working on this project for the past year. It is a complex project that took a lot of thought, brain power, and development. It's one part of the overall picture for an update of Genetic Genie. I wasn't physically or mentally capable of doing these complex tasks for many years, so sorry about the very slow updates! It is a research tool, not a diagnosis tool.

The app is set up to be tested. Please feel free to upload as many genomes as you like. Since the app consumes a lot of computing power and resources, it is set to automatically scale in the cloud. The more data that is ran, the more I can evaluate how well the app scales! Your browser is confined to the same server instance, so if you run multiple files at the same time in the same web browser, processing of subsequent data will be queued until the other data is finished processing.

Oh, and please report BUGS on this thread! 🐞🐛🤢

Current Limitations and Bugs:

1) While 23andMe proprietary SNVs and indels are supported (a ton of work!), they are not always 100% accurate. This is because 23andMe uses proprietary identifiers for a lot of variants. To my knowledge, there hasn't been anyone that has completely reverse engineered 23andMe's proprietary identifiers in a reliable fashion. This is because several rsID's and indels (insertions, deletions, etc) can co-exist at the exact chromosome/position. Without knowing the reference and alt alleles, you cannot convert them all with 100% accuracy. 23andMe doesn't offer this data to the public. A workaround may be possible but takes a lot more programming and it is not part of this release. Often a variant or indel at the very same location share the same clinical relevance, but this isn't always the case. [A successful workaround was implemented, and this issue is now considered resolved. Non-proprietary identifiers are now accurately called in 23andMe data even if there are multiple on the some chromosomal position.]

2) ~~Because of the above limitation,~~ we are not currently reporting more than one variant that is at the same exact position with 23andMe data. This may or may not change the results you see.

3) ~~The above limitations are not limitations with AncestryDNA data as they do not use proprietary identifiers. However,~~ because of the [previous] 23andMe limitations and trying to keep code consistent, AncestryDNA will not report multiple variants at the same location. This should be considered a bug that will be corrected.

4) Because these testing companies have used many chips over the years, their data is not always consistent and can be ambiguous between chips. For this reason, some data may not show up. For example, A Founder's mutation for BRCA1 that's common amongst Ashkenazi jews won't show up. This is because of a combination of ambiguous data and not being able to tell exactly what mutation between different chip versions. For example, one 23andMe chip version may show the reference allele for a deletion as II and another version of a 23andMe chip may show this data as a DD. For deletions the reference allele should always be II and for insertions it should always be DD, but 23andMe scientists (in combination with Illumina who makes the chips) aren't always consistent with their notation.

5) Tens of thousands of variants have been manually looked through and have had the data corrected. This doesn't mean that there won't be variants that are incorrect. However a lot of care was taken when filtering out noise and "bad variants." Information that you may find irrelevant will be returned along with data you find relevant. This is not a bug. This is by design as it's a robust research tool and some may find relevance in things that most others would consider irrelevant.

6) Drug responses may be corrected towards the risk of a good/bad response to the drug instead of the reference allele or minor allele frequency. This is intentional. However, the drug section can be confusing as sometimes a heterozygous variant carries the biggest risk, etc. This section will likely need some work to make the information more clear and easier to decipher.

7) Included is a text-based javascript filter so you can narrow down variants of interest. There are plans to create a button with a bunch of tools to filter variants in various ways. I'm considering creating "sort by" tools to reorder results as long as it proves to be efficient on the client end with javascript.

8) You currently can't download CSV files for results. This a feature that many citizen scientists and established scientists would want. I'm trying to think of a way to maintain privacy and store as little as possible data on the servers. Right now, the servers are configured so I generally cannot see information about genetic data. I am exploring ways to dymamically generate CSV files without storing such information on the servers themselves. A bit of a puzzle headache. [I may have a potential way to solve this issue as well as maintain strict privacy.]

9) Data is deleted automatically after upload. Sometimes this fails if the query doesn't finish. In this case, these de-identified files will be automatically discarded (hopefully) within 24 hours or sooner. There are still some bugs though around discarding files. Nevertheless, files disappear by design as server instances start and stop (which is very often).

10) I updated the link names to use the word "Variants" instead of "Mutations" I also updated the language in the blue boxes on each page as it wasn't written well. However, I didn't deploy these updates properly. Oops! A minor bug I guess, but still a bug. [This was fixed, but decided to continue to use the word "mutations" on the links.

Privacy & Terms of Service

No official privacy policy or terms of service have been created yet. In general, the service collects about as much information about you and your data as a cheap calculator. There are internal logs that I don't generally look at unless there are problems that can list things like IP addresses, etc. Google Analytics is currently not used with this testing phase. So currently no tracking/analytics cookies either! There is only one cookie set so the load balancer can identify your web browser and know how to scale the web application for you and other users.

It's possible that I will see de-identified genomes during routine maintenance in the case they aren't already automatically scrubbed. This data will never be distributed/shared in any way. There is no identifying information on the file names, in the files, etc.

While I believe the service to be secure, hacks happen. In the case of a compromising hack, it's possible for de-identified genetic information to be re-identified (such as if your genetic information is on other databases associated with your name or alias). In the event that law enforcement or any other authority requests data, it's unlikely that we'll be able to provide them with much, if any data because of the architecture of the system. Though it's certainly possible that an authority could get access to other information such as website logs. Of course, I would never expect a genomics research tool to ever be of interest to law enforcement or other authorities.

It's certainly possible that genetic disease can be discovered with this tool. Some may find this type of information as exciting and others may find such information scary or worrying, especially when they are looking at their own genome or a genome of a person they care about. If you are afraid of what you might find, do not use this tool! Since you were told this is a research tool and not a clinical diagnostics tool, we hold no liability if the service makes you seek a healthcare provider or further genetic testing. As I've stated numerous times, Big Data tools like this can and will have bugs and are therefore 100% accuracy can never be guaranteed. It can show a disease-causing variant that you don't really have as well as miss variants giving a false sense of reassurance. Doing research can help determine if the variant is accurately portrayed, but 23andMe, Ancestry, and WGS/WES data can also be wrong. We cannot vouch for the accuracy of third party data.

While it's possible that the data was misinterpreted by the service. It's also very possible for the user to misinterpret the data. This can also cause anxiety and cause one to seek medical care, especially if they are looking at their own genome or a genome of a person they are close to. Do your research as links to many amazing research websites are provided for each variant!

Aside from the databases I created myself, databases used were generally obtained from sources such as the U.S. Government (NCBI/NIH), Universities, and other third parties that provide public access to their information. I reserve all rights to my intellectual property and methods used to create and display data. I believe the look and feel of the data is very unique to this app and consider the styling of it as a creative work of art. Users can save copies of genomes by downloading/saving a website copy of the information generated. At this point in time, I do not give rights for the data to be used for commercial purposes without explicit permission and do not give rights for the generated data to be sold or distributed without explicit permission.

View attachment 32659 View attachment 32660 View attachment 32661 View attachment 32658

I have you to thank for finding my celiac genes when I put them through this program a couple of years ago. Thank you so much.

Heron N · Nov 26, 2022

kday said:
I think one thing that is probably a big risk factor for some people with ME/CFS is MBL2 deficiency.

If one shows up with two MBL2 heterozygous (or homozygous) SNPs, they are likely to have Mannose-Binding Lectin Deficiency.

My favorite genetic disease. Because 5-10% of the population has it. ME/CFS patients are about double that number. And much higher in sub-Saharan Africans.

It's like the orphan disease that everyone ignores and nobody has heard of because it's too common to pay attention to. Big risk factor for so many immune related things.

Easily one of my favorite genes/diseases (after lactase persistence of course).

I have a similar life experience as you when it comes to milk allergy. I'm fine with it now. Allergic as a kid.

Is this the one you mean

rs10824792 (C;T)
https://www.snpedia.com/index.php/Rs10824792
Mannose binding protein deficiency?

https://www.snpedia.com/index.php/rs4804803
rs4804803 (A;G)

Heron N · Nov 26, 2022

kday said:
Would anyone be interested in doing a patient collaborative study looking at whole genomes or am I a lone nutter? I have a small list candidate genes/risks that I want to verify prevalence of and also want to see if the algorithms can find other risks. I have also developed algorithms that can look at all copy number variants and structural variants in a genome as well. The algorithm could automatically assess loss of function, missense, CADD scores, Polyphen/SIFT deletriousness scores, etc.

.

Im with you on this one. Im currently working through my full genome from Nebula on Promethease ( I put it through Genetic genie but it didnt do as well ( sorry) )

Big Data App to Explore Genomes for Clinical Relevance, Rare Variants, Drug Response, etc (Free)

Brandit

Heron N

Heron N

Heron N

Heron N

Heron N