Big Data App to Explore Genomes for Clinical Relevance, Rare Variants, Drug Response, etc (Free)

nandixon

Senior Member
Messages
1,092
For example, for rs199535154 that @wigglethemouse mentions on S4Me, which is a SNP on CYP2D6, the study in both of its Supplementary Tables (1 & 2) gives the ME/CFS patients as having a 94% frequency for (presumably) the variant allele. But the variant allele in the general population is only found at a frequency of around 0.2% according to both the study's Supplementary Tables and according to dbSNP:
https://www.ncbi.nlm.nih.gov/snp/rs199535154

Not to mention that that (variant) SNP doesn't even meet the study's 10% frequency threshold requirement - except by virtue of the error of finding/saying the patients have a 94% frequency... which is impossible (I gave 3 possible explanations for this error in my other post that Wiggle linked to).
 

nandixon

Senior Member
Messages
1,092
We better not hijack @kday 's thread anymore but I took another look at the data in the Klimas study:
https://www.frontiersin.org/articles/10.3389/fped.2019.00206/full

and there are just way too many errors. I began thinking that it's as if novice students rather than scientists with a proper understanding of genetics did the analysis of the data. Well, here's from the Acknowledgements section of the paper:

Acknowledgments
We acknowledge the student of Dr. Kiran C. Patel College of Osteopathic Medicine of Nova Southeastern University Christopher Larrimore for his help in managing RedCap database. We would like to thank students of Halmos College of Natural Sciences and Oceanography of Nova Southeastern University Valentina Ramirez, Maria Cash and Pallavi Samudrala for their help with the analysis of data.


This paper probably needs to be retracted. I'm not sure what can be salvaged. Certainly the data needs to be completely reanalyzed.
 

kday

Senior Member
Messages
369
I rolled out an update for that converts all mtDNA positioning from Yoruba to Cambridge if uploading hg19 files.

Other minor changes have been made on the backend for automatically determining Assembly (The VCF specification made this way more difficult than it should be) and the Genetic Conditions tab will show some new things not show some things that were there before. The algorithm was made consistent with the way it processes 23andMe files.

That said, I have still not corrected bad variants in this update. You may see a bad variant or two under Genetic Conditions currently. I haven't made filters for WGS/WES VCF files yet. But there are filters for Ancestry and 23andMe as stated before. However, more needs to be done for filtering 23andMe/Ancestry mitochondrial variants, as it's still a mitochondrial variant mess!

It's also not compatible with GRCh38 yet. I've already prepared the necessary database files to provide support, I just have to add some code.

One more note that I am not sure if I mentioned before: There is currently not support for multi-allelic variants. This takes some more code and algorithms to get it to work correctly.
 
Last edited:

kday

Senior Member
Messages
369
I rolled out an update that curates the Genetic Conditions category a little bit. Some other minor changes that the user probably won't notice.

I also optimized the code/workflow. So I can make changes very fast (whether it's WGS, 23andMe, or Ancestry) and roll them out very quickly.

So if anyone finds things that need corrected, now is the time to tell me!.

I plan on incorporating the GET-EVIDENCE database that has user curated annotations for variants (from Harvard PGP). I think I am going to hide the automated predictions and have this data instead. Though one will be able to enable/disable the automated predictions. But GRCh38 support comes first. Does anyone know where I can find a clinical-grade b38 VCF file?
 
Last edited:

kday

Senior Member
Messages
369
@kday Klimas et al released the data for their 23andMe genetic study today. The full text of this study is now available here:
https://www.frontiersin.org/articles/10.3389/fped.2019.00206/full

@nandixon has a post here questioning data
https://forums.phoenixrising.me/threads/the-ido-metabolic-trap-guy.62727/post-2206434

I have some questions too on s4me thread (my personal v5 23andMe file has only 190 SNP's out of top 525 CADD SNPS listed in table 2, and BCAM one of their top highlighted genes has wrong frequency)
https://www.s4me.info/threads/genet...erez-nathanson-klimas-et-al.9415/#post-170872

Would it be possible to somehow run supplementary table 2 through your tools to check the frequency data and miscalls on the 23andMe and highlight errors that we could then pass on to interested parties? Table 1 and table 2 supplementary data is here
https://www.frontiersin.org/articles/10.3389/fped.2019.00206/full#supplementary-material

Many thanks,
Wiggle
@wigglethemouse

Sorry, missed this. You want to check frequency data for each variant? You can copy/paste the rsid and look it up at http://gnomad.broadinstitute.com.

You can then copy/paste the SNP to http://opensnp.org to see frequency data in 23andMe. And you will probably find that 23andMe has many miscalls based on this frequency data.

Indeed, there are a lot of errors in those tables and that study, sadly.
 
Last edited:

kday

Senior Member
Messages
369
More of a major update is being rolled. out. I've temporarily got rid of the automated predictions and replaced them with Harvard PGP's GET-Evidence database. This is human curated content about variants.

In an update, you will be able to turn on and off automated predictions as well as the GET-Evidence summaries.

Give it about 15 minutes from creation of this post (it takes a bit for the update to roll out) if you want to try out this new feature.
 

wigglethemouse

Senior Member
Messages
776
@kday I tested your latest version twice today, and both times the save page as does not save a page locally that can be re-opened. I am using Firefox. The 5/17 version saved fine.
 

kday

Senior Member
Messages
369
@wigglethemouse

Just tried with Firefox and saved a file. Worked fine here. Though if saving in Chrome there is an issue where it has trouble switching tabs. But apparently this has been an issue from the beginning after looking at saved Chrome files.

So Firefox works and actually saves pages better. Not sure why it's not working on your end. Nevertheless, I think this is a lesson for me to trust browsers to save pages correctly!

Chrome on the other hand thinks it's smarter than my code. So it modifies some HTML link locations when it saves. This appears to be a Chrome bug and have no control over their mistake other than reporting this problem as a bug.

I probably should cache the web pages and offer a link for someone to save it as a webpage, csv, or pdf. Future plans.
 
Last edited:

Moof

Senior Member
Messages
778
Location
UK
Curious about red hair varient.. using those terms, my red hair didn't show up while using the search box...

Can you describe where you found red hair varient?

Sorry not to reply, I've been away. It's the MC1R gene, which shows up under both 'Uncommon Mutations' and 'Other Risks'. I'm from a family with both red hair and a higher than usual incidence of malignant melanoma (although this gene hasn't been linked conclusively to MM risk).

Other genes may also produce red hair. It's said to be a recessive trait, but I have red hair with only one heterozygous MC1R variant – I think it's probably quite complicated!
 

Moof

Senior Member
Messages
778
Location
UK
Could I ask a question about WGS files, please, @kday? I have some, but the file format is 'xxxxx.vcf.gz.vcf'.

The engine doesn't like these, but simply removing the second '.vcf' from the filename doesn't work (I didn't imagine it would, but it's always worth trying! :rofl:). I'm not really familiar with this format, other than for address cards, so I don't know what to do for the best. Thank you!

ETA: the error displayed was 'bcftools index failed. Exited with error code 255'
 
Last edited:

kday

Senior Member
Messages
369
@Moof

Yeah, .vcf.gz.vcf is not a file format! Where were you sequenced? Maybe try downloading the file again from whoever sequenced you to see if you accidentally altered the file format?

If you put it on Dropbox or Google Drive and give me a link, I can probably restore it to its proper format.
 

kday

Senior Member
Messages
369
New update rolled out with GRCh38 support!

GRCh38 Assembly support has been added. I optimized a lot code to properly support both phased and unphased VCF files as well. And I created algorithms to call multi-allelic variants correctly if one were to come up.

So a lot of backend changes. Maybe not too visible on the surface. But it's possible that results might differ a little bit with this update.
 
Last edited:

Moof

Senior Member
Messages
778
Location
UK
Yeah, .vcf.gz.vcf is not a file format! Where were you sequenced? Maybe try downloading the file again from whoever sequenced you to see if you accidentally altered the file format?

Thank you! I think it might possibly be some sort of compression thing, as when I tried removing the extra .vcf at the end of one of them, the icon took on the appearance of a compressed file on my Mac. The files are downloaded directly from Dante's site – I'll PM you a Dropbox link, if that's okay.
 

Moof

Senior Member
Messages
778
Location
UK
@Moof .gz is a compressed archive format. Try renaming the file .vcf.gz and then uzip the contents to get the .vcf

Thank you! I did try that earlier, and uploaded the resulting file – I got an error message from @kday's site. I've sent a Dropbox link, though...but I will have another go in the meantime.
 

wigglethemouse

Senior Member
Messages
776
Just tried with Firefox and saved a file. Worked fine here. Though if saving in Chrome there is an issue where it has trouble switching tabs. But apparently this has been an issue from the beginning after looking at saved Chrome files.

So Firefox works and actually saves pages better. Not sure why it's not working on your end. Nevertheless, I think this is a lesson for me to trust browsers to save pages correctly!
I compared the previously successful save to the ones that now fail. When it worked the following files were saved in directory "Variant Report_files". That directory is no longer saved.
Variant Report_files
- all.css
- circle-progress.js
- jquery.js
- jquery_002.js
- jquery-ui.js

The browser window just shows a bunch of text. It seems without the saved files the formatting is lost. Could it be that you have the necessary files stored locally?

EDIT: I looked at the saved .html file. It has no html headers. Weird, just starts with text. Firefox v67.0
Code:
Doing Science...

*100/%/

*

**


  Variant Report

//

  * Genetic Conditions <#conditions>
  * Drug Response <#drugs>
  * Other Risks <#risks>
  * Rare Mutations <#rare>
  * Uncommon Mutations <#uncommon>


Filtering Variants

Showing variants that contain the term "mt"
 
Last edited:

kday

Senior Member
Messages
369
@wigglethemouse

I can't reproduce this. Some ideas I have is to save in a different directory or reinstall the browser. I'm testing with the same version of the browser.

What OS?
 

kday

Senior Member
Messages
369
Would anyone be interested in doing a patient collaborative study looking at whole genomes or am I a lone nutter? I have a small list candidate genes/risks that I want to verify prevalence of and also want to see if the algorithms can find other risks. I have also developed algorithms that can look at all copy number variants and structural variants in a genome as well. The algorithm could automatically assess loss of function, missense, CADD scores, Polyphen/SIFT deletriousness scores, etc.

I may sound crazy, but I have this idea of a semi-automated dynamic group study with data that automatically (and instantly) updates when more information is received and as algorithms and methods change. Text of the page can be edited in a Wiki style format. I don't think anything of this nature exists. But it's a potentially a very good way of quickly finding genetic risk factors without a lot of work. And methods can be refined over time with peer review that will naturally occur. Unfortunately, the recent genetic study wasn't much of a help. The algorithms I've already developed can make a web application like this very possible.

I don't think this syndrome is genetic in origin. But I think there could be a few major risk factors that haven't been uncovered by institutional research.
 
Last edited:

kday

Senior Member
Messages
369
Ah, nevermind my previous post. Perhaps my mind is getting too creative. I guess I am just sick of things moving so slow. It's been a decade for me and I feel like I nearly gave up completely not too long ago.
 
Back