1. Patients launch a $1.27 million crowdfunding campaign for ME/CFS gut microbiome study.
    Check out the website, Facebook and Twitter. Join in donate and spread the word!
No Longer Naive in the Ways of The Beast
After having lived for years with ME/CFS, Jody Smith learned there's more to this beast of an illness than she realized, and that what might help one person may not help others ...
Discuss the article on the Forums.

Creating SNP panels - Feedback on appearance?

Discussion in 'Genetic Testing and SNPs' started by Valentijn, Jun 4, 2013.

  1. roxie60

    roxie60 Senior Member

    Messages:
    1,579
    Likes:
    439
    Central Illinois, USA
    Well there goes what ever understanding I thought I had. So if we are looking at genetic genie for example, you are saying just because the +/+ is red doesn't mean it is defective (mutation not expressing itself properly resulting in functional impairment)? How in the heck can we make sense of the genotype info we are getting? So your panels are created identifying the most likely genotypes associated with a condition based on a study and further identifies any homozyg/heterozyg that has been specifically identified as defective (indicated by an !)?? Am I getting it right?

    I think a lot of people are taking +/+ / RED to be a sure sign of defunct gene expresion.
  2. Sea

    Sea Senior Member

    Messages:
    636
    Likes:
    642
    NSW Australia
    The +/+ means homozygote positive for a particular mutation but doesn't tell us whether there is any risk associated with that change. Some SNPs are not important at all. The SNPs that are included in Genetic Genie are there because they do have the potential to alter methylation or detox, so yes +/+ red = bad. If we are simply compiling charts of SNPs to find differences and similarities in particular groups then the red won't be that helpful, because not all SNPs will be faulty.
    Valentijn and roxie60 like this.
  3. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    To a large extent it is a sign of a problem. But it some cases it is such a common problem (like with the MAO SNP shown) that it's hard to take it seriously as an indicator of a problem with MAO functioning by itself.

    But +/- is often used when the heterozygous result indicates no problem at all. And sometimes that system could show +/- when the heterozyous is just as bad, or worse than, the red +/+ result.

    A lot of research is needed to make sense of the genotype info, and the existing panels I've seen don't even cite to sources for the genes they're using. It can be very hard to find relevant SNPs to put into a panel, because for some genes there is little or no research of the SNPs involved, even when it's obvious that the gene itself is important. When that happens, I think it's better to leave out irrelevant (or unresearched) SNPs instead of implying that we actually know anything about them - or we need to create the panel in a way that makes it very clear that although these SNPs are part of the gene, it is not known how they affect the functioning of that gene, if they affect it at all.

    For example, there's a gene implicated in folic acid intolerance. Richvank was suggesting that SNP results for that gene MIGHT give an indication of folic acid intolerance, and posted his own results for that gene as a presumably healthy "control" case. There is absolutely no research on any of the 23andMe SNPs associated with that gene. So it might be fun or informative to compare our results as ME/CFS patients to Richvanks result, but it's important that the lack of research about those genes is very clear, and that the panel isn't showing increased risk of anything. Richvank made those limitations perfectly clear, yet such a comparison could still be productive in that we'd be conducting our own non-rigorous research into the area to some extent.

    So while SNPs with an unknown effect can be useful or interesting to look at, I think it's essential to 1) make it clear when those mystery SNPs are mysteries, and 2) keep them separate from panels where it is known how many of the SNPs are affecting their genes, since when you get an in-depth explanation of how some SNPs are operating, you're going to assume there's a similar basis for knowing how the mystery SNPs operate.
    roxie60 likes this.
  4. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    A related question, though more on the subject of "content" than "appearance".

    I'm working on a methylation panel. For most genes there are at least one or two of the 23andMe SNPs known to be relevant in the functioning of that gene. But there are also 4-5 times as many SNPs which are rare or extremely rare - do you want to see the results for SNPs with rare genotypes displayed too?

    My concern about not showing them is that it leaves out some information about a potential oddity, which could then be investigated further. My concern about showing them is that it will make the results much more spammy and include potentially irrelevant results.

    If I do leave them in, they'll look different than the known dysfunctional SNPs. They won't be bolded, and will have a "?" instead of a "!".

    One possibility is leaving out most of the SNPs with rare genotypes, unless it's a REALLY rare genotype, such as being prevalant in less than 1 or 2% of the population.
  5. LaurieL

    LaurieL Senior Member

    Messages:
    447
    Likes:
    235
    Midwest
    Hows about making the rare genotypes available with the corresponding information with a link embedded into the page. This will leave out the confusion, and you can be as wordy as you like in the corresponding info. Then you wouldn't have to include a "?" per say, so all the information will still stay uniform. "International Code"

    Valentijn said...
    Absolutely, unequivocally, YES! I am tired of chasing snp's and would be appreciate of a little legwork displayed by some of these sites.
    Valentijn likes this.
  6. Bluebell

    Bluebell More % Neanderthal than Adreno but less hairy :-D

    Messages:
    392
    Likes:
    208
    Hi Valentijn,

    I have not read through what everyone else has written here, so I might be repeating what someone else has said. I might also have misunderstood some things.

    With those caveats, these are my first thoughts on what you asked in your initial post:


    1. One piece of data per cell

    Trying to load too much information into the same column (by using exclamation marks, bolded text, asterisks, colors, or whatever) makes it confusing. Spread the information out so that each cell only provides one key bit of data.

    In other words, have more columns. There is no inherent benefit in having as few columns as possible (if the chart is allowed to fill an entire printed page or computer screen, and isn't required to fit into a tiny 2-inch-square space in a journal article or something).

    You can easily fit 6 or 8 columns in a chart that covers an A4/letter-size printed page, and it would be easier for a novice to absorb the information presented that way.

    As an example, the chart I made for my 23andme results that I attached to my thread has 7 columns (actually, my chart's got 20 columns in real life, but those first 7 were the pertinent ones that I wanted to convey to the people who viewed my thread).
    http://forums.phoenixrising.me/inde...hts-and-pointers-on-my-23andme-results.23837/
    I don't think it looks too busy or overwhelming. Obviously, I didn't design it to provide all the information that you want to provide in your chart, so I'm not suggesting mine as any sort of template, merely using it as an example of how adding more columns can make this genetic information easier to understand.


    2. Words instead of symbols

    Do not be afraid of using words to describe characteristics. Not everyone "thinks" in symbols and can automatically translate from a symbol to the idea that it represents: seeing exclamation points, asterisks, bold font, plusses and minusses, highlight colors, etc. only confuses them and slows them down, rather than speeds up their processing of the data that is being presented to them.

    If your chart is not just going to be used by a small selection of repeat customers --such as medical doctors or genetic counselors who would use it for many of their patients and quickly would grasp the meaning of the symbols used-- but instead it's going to be used by a wide selection of one-time-only customers --people who get their genotypes tested and just want to use your product once, for themselves only-- it can be good to make everything in the chart as self-explanatory as possible, without requiring the user to consult a separate list/page/paragraph that gives a necessary translation of the symbols into words.

    People who are not used to interpreting large amounts of information are more comfortable with words than symbols or numbers. Even those who are comfortable using symbols and numbers can become confused by their meaning, as evidenced by how many people get confused by what Yasko's "+/-" indicators really mean and have to think it through several times before understanding how to read them.

    Instead of using a "+" symbol to mean that something is more likely, is higher, is beneficial (or any of the other things that a plus symbol might stand for), just put the actual words "high" or "low", or "beneficial" or "negative", or "up" or "down", or "benign" or "risky", etc. That way, the meaning is clear.

    Additionally, and this drives me personally bananas when it comes to Excel, a lot of symbols are read by spreadsheets as indicating the start of a formula, and therefore symbols can be a pain to copy and paste into, or type into, spreadsheets because the spreadsheet does not treat it as unformatted information but imposes an unwanted format onto it (and sometimes Excel can apply a formatting rule and actually remove the value that was originally typed into the cell before the user even knows what has happened). Not everyone knows that putting a single quotation mark before a symbol will protect it from Excel's meddling, but even if they do, mistakes happen and data can disappear or become altered without notice.


    An Example of 1 & 2:

    You can add a column entitled "regulation". The possible results in that column might be (in words): "up", "down", "neutral", "not applicable", or "no data".

    Now, if you would rather not use descriptive words and wish to use symbols, you can still have several columns within the spreadsheet to make the information easier to understand. E.g., you can have a "regulation" column and use an up arrow for up regulation, a down arrow for down regulation, a sideways arrow (with 2 heads on it) for no change in regulation, etc.


    3. Colors

    As I wrote above, I think it would be better to indicate everything in words instead of symbols. Beyond the basic problems I think are inherent in using symbols for a "lay" audience, I think using colors in an important graphic provides an additional layer of unnecessary complexity, because many times something can only be printed out or photocopied in black-and-white (such as at a doctor's office or library) and the colors disappear into indiscernible shades of grey, or even obscure entirely the very text they are meant to highlight.

    However, if you are going to use colors to signify anything, I would suggest not replicating ANY of the Yasko (GeneticGenie etc.) colors, because seeing those in your chart would only confuse people.

    Especially if you mean something different with the same color than Yasko means (which I think is your intention regarding the color red - that you intend for it to imply something different than Yasko implies with the color red).

    In other words, in your chart, do not use the color red at all. (Or yellow.)

    Instead of red, for a color that indicates "warmth", perhaps orange would be suitable. (And apropos for your location! :cool: )


    4. Info

    I would recommend including for each SNP the info that I did in my chart, and potentially a lot more: the gene, the variation (that's not the right word, I think, so use whatever is the name of that particular concept), the rs id, the possible alleles, the majority alleles, the minority alleles (you could also include the percentages of worldwide population for the alleles, or get fancier yet and note the allele percentages of the "ethnic" group or country that the person who is being tested is from), the tested person's alleles, and several further columns that explain the tested person's alleles and what that result actually means, if the regulation is constricted or overdriven, if it's a risk and they should be worried about it, if there are medications or lifestyle behaviors that can address it, what other mutations might work together with that one to produce a problematic health result, etc. etc.

    I'm just brainstorming, but I think it would be really useful to have some or all of this summarized in one chart.

    Maybe you could have one output which is just a 3-column chart (if you need a chart that is very narrow and succinct for some reason), and a second output which is the larger chart that gives a one-stop-shop of all the main info the user might want to keep top-of-mind about her/his genetic results.

    To make it really useful as a translation tool, you could put how Yasko reports the allele, how 23andme reports it, how the government or global genome database (or whatever) reports it, etc.
    23andme does this in their raw data area when they show how the user's allele would show up in dbSNP.


    5. Rare SNPs

    As you will remember, today a comment you wrote on another thread led me to discover that I have a comparatively rare allele (which will be known in future biology textbooks as the "Craigbell mutation").

    Rare things can give so much information. They can be the missing link for people who have examined every other "normal" cause of their illness/concern but still haven't solved it.

    I think too much is made of probabilities and expecting most people to fall into a normal category. So many health problems I have were overlooked or pooh-poohed by healthcare workers because they weren't in the top 3 most expected results or didn't fit the main two expected patterns. Rather than asking why the situation/lab reading/symptom was different than expected, they devalued the situation and said it was probably really somehow normal, and left it at that. [My creaking memory is whispering something to me about Type I and Type II errors, but I'm not sure if that's what I'm talking about here or not.]

    Besides, my definition of "rare" for health conditions would be more like one in 50,000, not 1 in 100. :)

    I don't know how many rare SNPs you are talking about including or leaving out, but including them, if you already have the data on them, and if they might have a real effect on the individual's health, could become one of the main differentiators of your chart, versus the others out there.
    Valentijn and LaurieL like this.
  7. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    Hi Craigbell Bluebell! Thanks for the thorough response :hug:

    I think this is a good suggestion. Initially I was trying to fit two columns worth of results per row, but that wasn't working, so it's definitely down to one set of results per row. Which means more space for more columns! So I'll think about how I can rearrange things a bit to make results the clearest.


    Good point. I also want it to look as reputable as possible, and colors might detract from that.
    Agreed - I think I may have been over-complicating things by putting in a symbol instead of word. Even "H" or "L" would be nicer than red or blue - though nothing is more awesome than orange! :p
    I'm just quoting this so I can find it easily again.

    Having the full name for the SNP does make it easier to talk about (such as the MTHFR C677T) versus a long number, but not all SNPs have that full name. It's also redundant since the rsID has already identified the SNP. So maybe nice to have, but not a priority - will see how much room is left and go from there :p

    I think that specifying the majority and minority alleles might be too much information - and somewhat confusing for the typical user, since the minor allele might not be the riskier one. But listing both alleles somewhere would be nice. I think listing the prevalence of the user's genotype would also be nice - doesn't take up much space and saves them some leg work. Ethnic group would be too complicated though, since it would require letting the user input information other than their 23andMe results, and me calculating and copying down a ton of information.

    I think much of the other suggestions would have to be in the form of a report, rather than a table, so would be a separate project to consider when this part is done.
    I think I'll stick to reporting it just as the government data base does, with an explanation somewhere on the site that the genetic testing companies often look at the other side of the SNP (or whatever) and that it's automatically translated on my site. Most users won't care too much about the details of that I think.
    My current thought is to have them available, but in a separate panel or a separate category within the same panel. If it was in the same panel (say Methylation), then I could have a category for "Central Methylation Risk Factors" followed by "Rare Central Methylation Genotypes". Or I could have one panel for risk factors associated with all of the process associated with methylation (it'll be big), and a seperate panel for rare genes associated with all of the processes associated with methylation (it'll be bigger). I'm leaning toward separate panels for risk factors versus rarities, to keep things clearer and uncluttered and more reputable.

    As far as frequency to be considered rare, I'm still not sure. My limit has been 5% or less for a genotype, but might take that down to 1-2%. Getting lower isn't practical due to the relatively small sample sizes available (usually 40-200 per batch). I suppose it'll depend on how many rare methylation results I have - if there's hundreds, I'll cut it back to 1 or 2%. But with having rarity as a separate panel from knows risks, there will be a lot more wiggle room for spamminess.

    So I definitely think having the risk versus rare panels separate is a good idea. Though I also think that pointing out when the risk genes are also rare is nice, with a separate column as you suggested for prevalence. I also want to integrate the high/low/risk column(s) some how, but will leave in the bolding as an additional way to draw attention to the risky results.
  8. Bluebell

    Bluebell More % Neanderthal than Adreno but less hairy :-D

    Messages:
    392
    Likes:
    208
    Another idea: Have your chart and information available in several languages.

    Being the interpreter of 23andme results for people who don't know very much English and therefore find it hard to read what 23andme is reporting about their results would be another valued service, separate from/in addition to detailing the methylation SNPs that 23andme really doesn't cover.

    I haven't noticed any other genetic results services that do this - most seem firmly American and inward-looking - but I haven't been looking for this info in other languages, so I wouldn't have come across it as a matter of course.

    ====
    On this, I take the following view based on my internet research of the last few months: At least with MTHFR, which I expect drives most methylation genetic testing at the present time (and that testing is mainly done on autism sufferers, miscarriage sufferers, and ME/CF sufferers, I would guess), almost everyone who is not a scientist will have come to recognize these weird-sounding genes that they have learned are really important for their health as "C677T" or "A1298C". They do not seem to go on motherhood or autism discussion forums and say they are "homozygous for rs1801133", but rather "for C677T".

    It may be redundant to list both the full name of the SNP and the rs number, but remember that you are creating a chart that is for the maximum utility of the user, not the creator. ;)



    I think that offering what their specific alleles would look like in the other systems would be more appreciated than you might think.

    A written explanation of how to translate the 4 letters and why the allele can be read backwards isn't the most understandable thing to most people, who never studied much genetics, and showing the images of the 2 or 3 results lined up in a row would be more user-friendly and ultra-time-saving.

    So many people are confused by the way 23andme and Yasko and the government list the same thing different ways (backwards/plus-minus/positive-negative direction, and A=T and C=G, "but sometimes it's not simply a one-for-one translation to the other direction because of the call letter", or whatever!)

    If people are even moderately interested in the subject matter (and most of the folks who are getting this test are VERY invested in getting well, getting pregnant, or helping their child to get well), they won't stick only to your chart when looking at their results - they will want to go look at the other reference sources and discussion forums online, and then they will get confused because no one makes comparing results simple. I've seen many commenters ask for help with this issue, on this site and at other discussion forums like MTHFR.net, etc.

    It's not your job to help people use the other sites properly, or to converse easily with folks who have received their results from the other sources, but it would be a nice addition.

    However, you might have a copyright/trademark problem if you mention the names Yasko or 23andme?

    ---
    It is not my business why you are doing this, who your intended audience is, what your institutional affiliation (if any) is, if it will be for profit or not, if you are planning to cross-sell other services or not, if you are going to build an informational website around the product or simply offer the results chart and that's it, if you are going to link up and cross-promote with other groups or bloggers who operate in this "marketspace"..., and I won't ask, but those are issues that are of course impacting how you design, develop and market your service.

    One thing that I saw mentioned on this forum to someone else who was creating such a gene report was to check into the legal implications of offering medical interpretations or advice (especially if it's for money, but I'd expect even for free). There may also be an across-international-borders complication. Have a lawyer help you to understand your liabilities and to craft a user terms & conditions agreement, if you haven't yet.

    ==
    :ninja: bb
  9. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    I'll keep that in mind if I go for automated interpretations of results. But if it's just a word or two (high, low, risk, rare) then that's very easy to check in Google Translate and such.

    True. But then when there's no "C677T" they refer to it as the gene abbreviation, which can be confusing. I'll see if there's a standard for when they use "-01,-02,-19" and such.
    In good conscience, I can't make it compatible with Yasko interpretations. Too many of them are simply wrong or meaningless. And how do you succintly explain that real risk is GG, but Yasko says AA, especially if I'm using C/T and Yasko is using G/A? I'd rather have my results stand alone (with cited research), and anyone that's up for getting into the nitty gritty details of C versus G can read a brief explanation or do their own research.

    I really think the vast majority won't care how I write it, versus how Yasko results are written, versus how 23andMe results it. And it would be too spammy and complicated, both for me and for my programmer (Mr Valentijn). I also don't want to be responsible for keeping up with how other services are reporting things - I'm not interested in checking up on whether Yasko or Geneticgenie have changed from C to G!
    Talking about them (basically like we are here) won't be an issue, though I will make sure to have a clear disclaimer that I'm not associated with them at all.
    I think that is relevant :p But it's just going to be a free service designed for ME patients, though somewhat applicable to other people as well. We're not putting any money into designing it (fiance does the programming, we already have a server available to host it on, etc) so no need to make any money from it. Also, I think that the ME community has been incredibly helpful to me, and it would unethical to try to make money off of other patients, especially ones who often don't have access to doctors or proper treatment and care.
    Yeah, plenty of disclaimers and language to make it clear that any interpretations are not medical advice, etc etc. If I even do interpretations - that's a different project for the future, which will be quite big and intense.

    And luckily I'm a licensed attorney, and can give myself free legal advice :D
  10. Bluebell

    Bluebell More % Neanderthal than Adreno but less hairy :-D

    Messages:
    392
    Likes:
    208
    'Mr. Valentine' is just too perfect a name for a fiance. :love:

    ====
    > Including results as would be shown by Yasko/GeneticGenie/whichever else:

    Ahhhh -- I had not thought about the burden of keeping up with any changes that occur in Yasko's/others' systems - I can see that it would be unworkable for you!

    I don't agree that you would have had to explain the reasoning behind those usages of Yasko's which don't match up with other systems' - just having them juxtaposed alongside the other systems' depictions of genetic test results that align more with published findings and representational norms would be indication enough that something was a bit odd there.

    ====
    > Gene abbreviation, standard for when they use "-01,-02,-19" and such

    Nearly all the gene snps that I had in my results chart either had what I called the "variation" name or a number after it like "-08". I expect you already have all of those "variation" names, but in case it might assist you to see them in one place, I'll put the full version of my chart here (including the SNPs that I had the "majority" allele for, which I had left off the chart that I appended to my thread a few days ago).

    I put my chart together myself after researching what methylation-cycle-related rs numbers a number of other people on various sites had reported on -- I did not put my 23andme data through any third-party number-crunching program like Promethease or Geneticgenie because I didn't judge the potential loss of privacy to be worth it. [The existence of any privacy at any time, anywhere, anymore is debateable, but I'm old-fashioned like that. :)]

    Yesterday, I skimmed maybe 30 research articles on various SNPs and medical conditions, and, frustratingly, so many of the articles didn't include the actual rs number of the SNPs that were studied (at least, anywhere that I could see). Some were even from 2012... I don't know when the rs number became a standard identifier - maybe only recently. Because these things are still being referred to in several different ways, I do think it would be useful to have the other names for each rs number listed alongside it.

    ====
    I laud your reasons for creating your program and your ideals about providing free access and only presenting highly-reputable information. :angel:

    ====
    ...I guess somehow you have to begin neutralizing all the negative karma points you've racked up for being a lawyer. :devil:




    {That was a joke! :D }

    Gene Variation rsID Alleles Major Alleles My Alleles My Result
    ACAT1-02 rs3741049 +T/-C GG AA +/+
    ACE Del 16 a deletion, not a SNP not found
    AHCY "Craigbell" rs13043752 +A-G GG AG +/-
    AHCY-01 rs819147 +G/-A TT TT -/-
    AHCY-02 rs819134 +G/-A AA AA -/-
    AHCY-19 rs819171 +G/-A TT TT -/-
    BHMT-01 rs585800 (or rs492842?) +T/-A not found
    BHMT-02 rs567754 +T/-C CC TT +/+
    BHMT-04 rs617219 +C/-A AA CC +/+
    BHMT-08 rs651852 +T/-C CC TT +/+
    CBS A360A (C1080T) rs1801181 +T/-C GG AG +/-
    CBS C699T (Y233Y) rs234706 +A/-G GG GG -/-
    CBS N212N rs2298758 +?/-G GG GG -/-
    COMT H62H rs4633 +T/-C CC CT +/-
    COMT L136L rs4818 +C/-G
    COMT V158M rs4680 +A/-G GG AG +/-
    COMT-61 P199P rs769224 +A/-G GG GG -/-
    MAO A R297R rs6323 +T/-G G TT +/+
    MTHFR A1298C (E429A) rs1801131 +C/-A TT GT +/-
    MTHFR C677T (A222V) rs1801133 +T/-C GG AG +/-
    MTHFR-03 P39P rs2066470 +T/-C GG GG -/-
    MTR A2756G (A919G) rs1805087 +G/-A AA AA -/-
    MTRR A66G (A919G) rs1801394 +G/-A AA AG +/-
    MTRR H595Y rs10380 +T/-C CC CC -/-
    MTRR K350A rs162036 +G/-A AA AA -/-
    MTRR R415T rs2287780 +T/-C CC CC -/-
    MTRR S257T rs2303080 +A/-T not found not found
    MTRR-11 A664A rs1802059 +A/-G GG AG +/-
    NOS 3 D298E (G894T) rs1799983 +T/-G not found not found
    SHMT1 C1420T rs1979277 +A/-G GG GG -/-
    SHMT2
    SUOX A628G rs7297662 +G/-A
    SUOX S370S unknown +C/-T unknown not found
    VDR Bsm rs1544410 +A/-G CC CC -/-
    VDR Fok-l rs10735810 [rs2228570 (FokI) (Met1Thr, formerly known as rs10735810)] +T/-C not found
    VDR Taq rs731236 +G/-A (Yasko +A/-G) AA AA -/-
    Valentijn likes this.
  11. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    This was in a different thread, but it does present an interesting problem.

    Maybe I can put the SNP-sorting program(s) and associated data files up for download somewhere on the site. Then people can download the program itself and do the sorting on their own PC, thus ensuring that no one else sees their results.

    Though a more basic step that can be taken is for the patient to rename their 23andMe genetic file, so their full name isn't being given to the sites they run it through when the file is uploaded. But that still leaves the patient's IP address at the sites, which some people won't be comfortable with.
    ukxmrv likes this.
  12. Bluebell

    Bluebell More % Neanderthal than Adreno but less hairy :-D

    Messages:
    392
    Likes:
    208
    Yes, I did rename one copy of my file to "anonymous" (and went inside it and took my name off of the start of it, although I don't know if my name appears anywhere else) when I was considering uploading it somewhere, but I decided not to use any of the external programs in the end.

    There is one site I have seen that will let you download their analysis program to your own computer (might be Promethease, for $2?) but they warn that it takes 4 hours to run, and you also don't get their personalized report when you do it that way.

    However you configure your program, please make it less complicated to use than that rare snps program you tried the other day! (Which uncovered the 'Valentijn-1' mutation.) It had too many steps for the average person - or even for the rare person. :snigger:

    I don't quite see how the red eternity symbol on the little emoticon indicates a "snigger" - kind of looks like a drunken red lipstick application to me. :ninja:

    I would be a lot more comfortable with using your program if I could run it in the closed confines of my own computer.

    As you know, it's not that one completely mistrusts another, it's that datasets big and small get compromised all the time, beyond the control of the folks/companies who are storing the data.

    Though I wouldn't put it past some people to collect this kind of thing and then release the info ("anonymized"), use it for a research study, etc. Even the NHS in the UK is doing this now with patients' health files - giving them to private companies to analyse and profit from.

    Sadly, even the most bizarre science-fiction plotline (for example, rounding up everyone who has a certain mutation and sterilizing them or confining them or making them worker-slaves or studhorses or something) doesn't seem out of the realm of possibility for what our society might morph into 20 or 30 years' hence. Things can change FAST when the right financial, environmental, psychological, ethical conditions and contraints are in place. The books I read in high school English (and the teeny tiny bit of History that my school system saw fit to teach) that described worlds that then seemed a million miles off now feel quite sane and portentious, from _The Jungle_ to _1984_.

    But anyway, I veered off there for a minute. :whistle:
    Valentijn likes this.
  13. Bluebell

    Bluebell More % Neanderthal than Adreno but less hairy :-D

    Messages:
    392
    Likes:
    208
    Besides, since 23andme was founded by and is run by the wife of the man who runs Google, and is funded in part by him and Google, I don't expect them to be a paragon of privacy, whatever they might proclaim.

    Maybe when the going gets tough, I can play on the twinningness between zillionaire Craig and myself, and get a free pass to the illuminati fortress island or however everything's going to shake down.

    No, I'm not a lunatic*. :p Maybe my detox is starting, even before I have taken my first methylation supplements!

    I think I need to get the coffee ice cream from the freezer now. :zippit:

    ----
    *if you are reading this and have no idea what I'm talking about, I'm mentioning something that was a joke between Valentijn and me. So there is *some* kind of sense in the gobbledygook that I'm spouting, though not much. :)
    Valentijn likes this.
  14. kday

    kday Senior Member

    Messages:
    259
    Likes:
    46
    I've been working with the 23andMe API which is a more secure way to access data and doesn't require a file upload. I also enabled SSL when dealing with any info that the user may want private.

    I am just waiting approval from the 23andMe API team. I restructured the programming to be able to implement such charts as the one you posted very rapidly. If you would like to work on this together for web output, you can PM me. Web content and layout (and a web version for mobile devices) is being redone as well be redone as well, but I am trying to focus on one thing at a time. Strange enough, the majority of the visitors come on a mobile device - typically iOS. I can't figure out why this is, but a a mobile interface seems important.

    If you want to do this alone, I understand. But I'm more than willing to help you out.

    I'm not sure if this is your most recent thread on this topic.
    Valentijn likes this.
  15. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    Just as a brief update, we had some time to work on things last night, and are making good progress on a downloadable program for people to run if they want to find their rare alleles without transmitting their entire 23andMe file over. the internet.

    So now we've got a script to pull out the rare results from the (bleep)ing huge genetic data files, and to then pull out the 23andMe results from the rare results, which should result in a single file which has all of the 23andMe SNPs which have rare results.

    We're looking at alleles with frequency of 3% or lower, and so far the 23andMe rare files we're generating for even the biggest chromosomes are looking small, so putting those into a single file should still leave it small enough to download quickly.

    The next step will be to make a program that compares the rare results files to patients full 23andMe files. This will also be downloadable, and I'm intending that there be different options that the user can select, to see either their rare alleles that have prevalence of 3%, 2%, or 1% and under.

    Then we'll adapt the program for web use, so people can either run the program on our website by uploading their 23andMe file, or they can download the program and rare allele file to run the program on their own computer without needing to upload their 23andMe file to the website.
    ukxmrv likes this.
  16. bel canto

    bel canto Senior Member

    Messages:
    187
    Likes:
    169
    Val - that is a great service to us all! Thanks to you and Mr. Val!!

    Do you know how Promethease get the data they describe as "unique to you" after running our 23andme files? I could not match up several items from Ian Logan's program, and I'd like to understand how they are different.
  17. Valentijn

    Valentijn Activity Level: 3

    Messages:
    5,589
    Likes:
    7,139
    Amersfoort, Netherlands
    No idea, but at least some of the Promethease prevalence data seems wrong (it's showing 0% for my genotype when allele prevalence is 20%). Ian Logan is getting his from the 1000 genome project, which is also where I'm getting it from.
  18. bel canto

    bel canto Senior Member

    Messages:
    187
    Likes:
    169
    Thanks. I'll know where to focus my limited efforts.:)
  19. kday

    kday Senior Member

    Messages:
    259
    Likes:
    46
    It's worth noting that GEDmatch finds rare SNPs last I looked. Not the most user friendly interface ever, but it seemed to do the job well. and the site owner has put some of these poweful features up and down over time.

    I tried making a rare SNP web tool using a very large database developed by a University (Harvard?). It's no easy task, and it consumed a tremendous amount of resources. Never got it working right anyway and gave up.

    It would be nice to have a cross-platform program to do the crunching though.
    Valentijn likes this.
  20. kday

    kday Senior Member

    Messages:
    259
    Likes:
    46
    FYI: GEDmatch has a much for friendly interface now. Just looked at it and it has changed since I last used it.
    Sea likes this.

See more popular forum discussions.

Share This Page