Ion Channel SNP Paper

alex3619 · May 12, 2015

SNPs are, at this point, best thought of as risk factors for ME in my opinion, not causes. If they caused it then all or most with those SNPs would get ill. Its also possible that ME patients have many such risk factors. The combined total for a healthy person is probably not zero, but for many of us its high, and when the right triggers come along it tips us over the edge. Its very difficult to speak of causation with any reliability at this point.

Valentijn · May 12, 2015

nandixon said:
It's not so important, but just to mention first that CBS C699T is not a non-coding variant. It's a coding SNP in exon 8 of the CBS gene.

Sort of. It's in the coding section, but is synonymous. So it doesn't result in any coding change, ever.

I've read enough SNP-disease correlation papers to feel pretty confident to believe that you're fairly incorrect about what you're thinking here. There are plenty of (non-coding) SNPs in the untranslated regions (3'-UTR, 5'-UTR), as well as intronic SNPs (e.g., RNA splicing), of genes that can easily have the same degree of detrimental impact on health as coding SNPs like C677T in the MTHFR gene.

Can you name one which has a similar impact - say 50% or greater impact on gene function? I haven't seen any.

Taking introns as an example, here's an intronic SNP that was shown - in a functional assay - to cause neurofibromatosis:
Functional splicing assay shows a pathogenic intronic mutation in neurofibromatosis type 1 (NF1) due to intronic sequence exonization

1) This is a single case study involving a single patient. It's not uncommon for innocent SNPs to be implicated in these sorts of studies, even with missense mutations, then later found to be a bit more common and harmless than the study suggests.

2) They used non-standard (new) procedures to do the analysis, so it's difficult to tell if their methods were valid or not.

3) As best I can understand, the variant is suspected of either creating an additional exon or removing one. I'm not sure why they were not able to verify this by looking at the actual proteins in the patient - that should be relatively simple.

And of course, it's well known that UTR SNPs are heavily involved in, e.g., diabetes:

The Role of Single Nucleotide Polymorphisms of Untranslated Regions (Utrs) in Insulin Resistance Pathogenesis in Patients with Type 2 Diabetes

Yes, UTRs are quite a bit more interesting than the rest of the intron areas. But those aren't what these CFS SNP studies are discussing, nor is most of the correlating research with weak or inconsistent associations. I'm also not a big fan of "it's well known" in general, even if some guy has published a paper saying so

I think you're thinking that we're looking for a SNP(s) as a cause for ME/CFS that is of such great significance that it's going to show up even in a cross-ethnic genome analysis and that it's going to need to be a coding SNP. That might happen, but I doubt it at this point. I think it's much more likely, assuming there is an underlying genetic component to our disease, that it will be a SNP(s) that has an effect not on the function of the resulting protein/enzyme, but rather is somehow susceptible to causing dysregulation of the gene. Just my opinion.

I think the only thing we can assume at this point is that we have no idea if or where problems will be found, nor what type of problems they will be. Turning every SNP with a weak association into a potential culprit is not at all useful outside of a research context.

adreno · May 12, 2015

And let's not forget a very important factor here: the microbiome. Trying to directly match genotype to phenotype without factoring in the microbiome (which outnumbers human cells ten times), seems like trying to solve an equation with 90% of the factors unknown. The microbiome functions as an "extended genome" and affects epigenetic processes including gene expression, protein folding, etc. It also has more direct regulatory effects on signaling molecules such as cytokines, hormones and neurotransmitters.

The past decade in human genetic disease has been dominated by genome-wide association studies and, more recently, by sequencing human genomes. The goal is to discover disease genes by identifying variants that cause or increase the risk of disease. In genome-wide association studies, patients with disease are compared with unaffected controls. DNA is collected from both groups, and approximately a million genetic variants are surveyed in their genomes. Variants that are at a higher frequency in patients than in controls are associated with an increased risk of disease. More than a thousand disease-associated variants have been identi- fied from over 400 genome-wide association studies, but much is still unknown about the genetic heritability of disease. For example, genome-wide association studies have identified 40 markers associated with height, but these markers only account for ∼5% of height’s heritability (Maher, 2008). Despite studying thou- sands of individuals, <10% of heritability is explained for most diseases (Manolio et al., 2009).

Individual human genomes have been sequenced, and there are approximately 3 million to 4 million variations with respect to the reference genome (Frazer et al., 2009). It is thought that some of these variants will cause phenotypic differences that can lead to disease or apparent physical traits. It is estimated that 3–8% of the human genome is functional (Siepel et al., 2005), so it is unlikely that all the variation in the 3 Gb human genome will lead to phenotypic differences. Rather, func- tional variants may be localized to the 90–240 Mb of human genome that contains transcribed coding genes, regulatory elements, RNA genes, and other functional elements.

By contrast, the human microbiome has extensive diversity. Each location (skin, mouth, intestine, etc) has its own metagenome. Recent studies have suggested that healthy individuals have up to 15,000 species-level phylotypes in their gastroin- testinal tracts as determined by 16S rRNA sequencing (>97% identity) (Peterson et al., 2008) (Fig. 1.1 of this chapter), and that the two major phylogenetic groups present are the Firmicutes and Bacteriodetes. The average genome size of sequenced organisms from these groups is 3.4 Mb (Liolios et al., 2008), and the percent- age of these genomes that codes for protein-coding genes is approximately 92%. Therefore, the functional part of the gastrointestinal microbiome can be estimated to be approximately 47,000 Mb (15,000 × 3.4 × 0.92), which is more than two orders of magnitude greater than the above-mentioned estimate of the functional part of the human genome.

http://ttaxus.com/files/Badger2011-HumanGenomeMicrobiome.pdf

user9876 · May 12, 2015

Jonathan Edwards said:
As I see it each SNP is an indicator of a variant of the gene for a protein that might encode a form of the protein with a slightly stronger or weaker (or more or less specific etc.) function. So it is a bit like if you have SNP 012345678 then that means that you make a variant of a protein (in common with some proportion of the population and not others) that might be a little bit e.g. 'stronger' than in the other people. If this affects a threshold for tripping a feedback loop then it can confer disease risk. The same feedback loop might be tripped by all sorts of variants in either the same protein or one of a family of similar proteins or even an unrelated protein. By analogy with a fuel that makes a car backfire, you might be predisposed to this if you had a SNP that goes with a particular octane rating for the petrol or a SNP that goes with not enough lead additive, etc.

The implication is that all the SNPs with higher frequencies in the patients might go with gene variants that encode for loop-tripping variant proteins. You might then ask what one would expect to find in terms of combinations of SNPs in individuals. I think things get very complicated here because some SNPs go around linked together, as Valentijn points out. Moreover, since a loop trip is a loop trip then there may be no particular increase in people with a double dose of loop tripping SNPs - or there might be. But it is doubtful that the data would be firm enough to start testing that..

I am not sure whether that will make much more sense than the paper but hopefully it will not just confuse more!

Thanks, I think I may be starting to get it. I will try to put what I'm understanding into words to check:

So the idea is that there are slight variations in gene sequences (one letter) which make up different forms of an SNP which is some form of unit in a gene sequence. Then we know that certain genes affect certain proteins in the body that are linked to particular parts of some cycle. Hence when looking at what is different about such genes in patients as compared to controls then it may suggest some people are predisposed but also may give hints as to mechanisms due to the links between predisposition and cycles in the body.

So what we have in this paper is a comparison between a control and patient set and it seems to identify TRPM3 particularly as important and perhaps some others. Within each SNP they seem to compare different forms (e.g. A vs C) in the samples they have.

From a stats perspective the question is are the control and patient sets large enough to characterise the natural variation. So if I have a population P with several million people and pull 90 into a set C then are the set C a good representation of the overall population P. Having just had an election here this is what the pollsters do but they do that with knowledge of the distribution of various socio-economic factors that effect voting. I don't see how we can know if the 90(ish) control patients make up a good representation of the overall population. Hence when comparing with a set of patients I don't see how we can know if the patients are different from the norm. But there may well be ways. I suspect it all comes down to sample theory.

There is another question which is what is the distribution over the patient populations. In the paper they appear to present pairwise comparisons of mutations what would be nice to see is the relative frequency of different mutations and hence is one missing that may be more normal in the overall population (so compare different relative frequencies). I guess as normal I would like a different data presentation.

I get the point about it being hard to look for jointly acting SNPs since they are very complex interactions. One interesting thing may be some form of decision tree analysis. So if there were a set of patients with normal mutations (looking at the relative frequencies) for say TRPM3 then if we were to just take these groups do they have very different patterns over a different SNP. I guess that is some form of sub-grouping.

Data mining could be done on data like this and I think that is valid to help from a hypothesis. But then further tests need to be done to validate the hypothesis both looking at SNPs in larger groups and trying to understand the implications of the cycles and test these in other ways.

Hopefully some of that might make a little bit of sense.

Valentijn · May 12, 2015

nandixon said:
Global allelic frequencies aren't relevant for these types of analyses. Ethnic specific data must be used. See my post here for more information:

http://forums.phoenixrising.me/inde...an-extremely-rare-mutation.37366/#post-593923

The problem is that there isn't a large amount of ethnic-specific data available yet. Even CEU groups can be inapplicable to most Europeans, for example, as they are a very specific subset. I think both global and approximate ethnic variations can be useful in seeing what's normal, however. If 25% of the world is homozygous for a variation, it's extremely unlikely that it's doing anything problematic even in groups where it's very rare.

lansbergen · May 12, 2015

Valentijn said:
Turning every SNP with a weak association into a potential culprit is not at all useful outside of a research context.

I agree.

Valentijn said:
3) As best I can understand, the variant is suspected of either creating an additional exon or removing one. I'm not sure why they were not able to verify this by looking at the actual proteins in the patient - that should be relatively simple.

Yes, why not?

barbc56 · May 12, 2015

I certainly wish I understood this better,. I don't know if this question is relavant.

Are the findings compared to patients with other diseases? If not, wouldn't that be important information?

Barb

nandixon · May 12, 2015

Valentijn said:
Sort of. It's in the coding section, but is synonymous. So it doesn't result in any coding change, ever.

Yes, but it is a coding SNP and, despite creating an identical protein, the "risk" allele has been definitively shown to increase CBS activity (which, while beneficial for purposes of lowering homocysteine could, in theory, disrupt the transsulfuration pathway).

Can you name one which has a similar impact - say 50% or greater impact on gene function? I haven't seen any.

Yes, the "single case study" you're describing below. ("The mutation, showed aberrant splicing with approximately 45% normal transcripts containing exon 30 and 55% an aberrant transcript of 604 bp.")

1) This is a single case study involving a single patient. It's not uncommon for innocent SNPs to be implicated in these sorts of studies, even with missense mutations, then later found to be a bit more common and harmless than the study suggests.

2) They used non-standard (new) procedures to do the analysis, so it's difficult to tell if their methods were valid or not.

3) As best I can understand, the variant is suspected of either creating an additional exon or removing one. I'm not sure why they were not able to verify this by looking at the actual proteins in the patient - that should be relatively simple.

They effectively did this using a hybrid minigene.

Yes, UTRs are quite a bit more interesting than the rest of the intron areas. But those aren't what these CFS SNP studies are discussing, nor is most of the correlating research with weak or inconsistent associations. I'm also not a big fan of "it's well known" in general, even if some guy has published a paper saying so

I think the only thing we can assume at this point is that we have no idea if or where problems will be found, nor what type of problems they will be. Turning every SNP with a weak association into a potential culprit is not at all useful outside of a research context.

It doesn't have to be turned into a culprit, merely quickly checked to see if it validates/"replicates" and, if not, discarded. Much better that than never finding what you're looking for at all.

nandixon · May 12, 2015

Valentijn said:
The problem is that there isn't a large amount of ethnic-specific data available yet. Even CEU groups can be inapplicable to most Europeans, for example, as they are a very specific subset. I think both global and approximate ethnic variations can be useful in seeing what's normal, however. If 25% of the world is homozygous for a variation, it's extremely unlikely that it's doing anything problematic even in groups where it's very rare.

It's much better to use a somewhat specific European ethnic group in the form of CEU, which actually contains the ancestral genomes of northern and western Europeans, than to use the "world" in the form of the full 1000 Genomes dataset, which contains 50% Asian genomes - and which can vary grossly in minor allele frequencies from that found in Europeans, CEU or otherwise.

Valentijn · May 12, 2015

barbc56 said:
I certainly wish I understood this better,. I don't know if this question is relavant.

Are the findings compared to patients with other diseases? If not, wouldn't that be important information?

It would be relevant if they had found a potential biomarker. But since the results they found aren't even distinguishable from the general population, I think they can go in the "junk" pile long before we get to the point where we think about comparisons between diseases.

Comparisons could also be relevant if multiple disease had a similar pathway or underlying susceptibilities.

Jonathan Edwards · May 12, 2015

user9876 said:
Thanks, I think I may be starting to get it. I will try to put what I'm understanding into words to check:...

Hopefully some of that might make a little bit of sense.

That sounds on track to me.

One thing I might add in the light of the discussion is that my understanding of SNPs in this context is that all they need to be are arbitrary 'fingerprints' of a particular gene allele or maybe a chromosome segment of several closely linked genes with of fixed alleles. So the SNP might only lie in gene P if it is of the P(a) form (assuming only one historical origin for P(a) in evolution) or in gene P in a segment of genes PQRS only when P is P(a), and hence (because the segment does not split up) only in the segment P(a)Q(c)R(k)S(b). So even if the SNP is in the P gene it does not matter even if it is in a completely functionless part of the promoter sequence if either P(a) or Q(c) or S(b) has a risk prone function. SNPs in exons that are not synonymous are obviously born to be functional, but I see no reason why SNPs in silent areas should not be associated with major risk, even if nobody has a good example yet.

Jonathan Edwards · May 12, 2015

I am puzzled. I did a 'quick chi square' using software on the net for the first SNP and got a chi square of 5.1 and uncorrected p value of 0.02. The paper gives a chi square of 8 and p value 0.003 for this. I think I did something wrong but that wrong?

Jonathan Edwards · May 12, 2015

Valentijn said:
I'm afraid that my only contribution to a repository regarding genetic research into ME at this point would involve a rude emoticon

Maybe your little collection has proved rather more useful than that already! My repository would include unpublished collections - and they might well be in the majority.

aimossy · May 12, 2015

@user9876 your above explaining of your thinking really helped me out with understanding this SNP "stuff" more. Thank you.

Simon · May 12, 2015

Jonathan Edwards said:
I am puzzled. I did a 'quick chi square' using software on the net for the first SNP and got a chi square of 5.1 and uncorrected p value of 0.02. The paper gives a chi square of 8 and p value 0.003 for this. I think I did something wrong but that wrong?

I did that too. Sometime later I remembered cells are diploid (but not before I'd asked a mathematician friend if he could see where I'd gone wrong - cue Homer Simpson moment). However, still got p values that were not much different from those in the paper (just a little lower), which made me wonder about the correction used.

Jonathan Edwards · May 12, 2015

Simon said:
I did that too. Sometime later I remembered cells are diploid (but not before I'd asked a mathematician friend if he could see where I'd gone wrong - cue Homer Simpson moment). However, still got p values that were not much different from those in the paper (just a little lower), which made me wonder about the correction used.

How does the diploidy come in to it Simon?

Simon · May 12, 2015

Jonathan Edwards said:
How does the diploidy come in to it Simon?

115 patients have 230 SNP versions. Doubling up the numbers in both cases and controls increases the significance of the differences, pushing the p values down (by a factor of nearly ten). It's a bit like comparing a 50% performance difference in a trial of 50 patients on 2 different therapies with a 50% difference in a trial with 100 patients. The bigger study will have a lower p value for the same difference

edit: try the numbers in an online calculator, that's what I did

Jonathan Edwards · May 12, 2015

Simon said:
115 patients have 230 SNP versions. Doubling up the numbers in both cases and controls increases the significance of the differences, pushing the p values down (by a factor of nearly ten). It's a bit like comparing a 50% performance difference in a trial of 50 patients on 2 different therapies with a 50% difference in a trial with 100 patients. The bigger study will have a lower p value for the same difference

edit: try the numbers in an online calculator, that's what I did

OK, I can see that works if we assume that alleles are independently determined in each individual. That may be a reasonable assumption for a totally outbred population, but if there are different racial groups in the sample it might be invalid. Parents are more often racially related than by chance, even if not actually consanguinous in a closer way. And I guess within small communities independence would break down too. I had assumed that one was asking what the likelihood was that the patients belong to a normal population. It seems that the question is whether their alleles belong to a normal pool. I presume that if one were to select a feature of a patient (presence of at least one copy of the allele or whatever - or even a three column table for one or two copies) one would get a new set of figures which when processed should give the same chi square (since probabilities should not be dependent on the way you move numbers around). It all gets a bit complicated!

But taking a rough and ready approach it does seem that the uncorrected p value for the first SNP might be closer to 0.002. But this is not too different from the 0.003 given. We still have what looks like an uncorrected p value, as you say.

sdmcvicar · May 12, 2015

SNP effects in vivo are tough to parse even when the major disease-causing mechanism *is* known. The article had a ton of speculative links to common SEID symptoms regarding how these SNPs might be biologically relevant, but in the end this all falls under "potential modulators of overall disease and symptom-specific severity". Being a protein biochemist rather than a deep sequencing wonk, my mind went in a completely different direction from the general thread. A couple more thoughts for you all:

1) Assessing SNP effects on protein expression is complicated by the fact that multipass transmembrane receptors are difficult to isolate from the cell membrane, given their hydrophobic affinity. It's a significant technical challenge for the field to unravel the SNP effect on transcription -> receptor protein expression link. Still, checking out a TRPM3 knockout mouse model for PEM in the form of a multi-day exercise capacity trial, possibly with some form of maze completion on subsequent days for memory/concentration, would sure be interesting. You know, for those of you out there who love writing grant applications.

2) Known tissue expression of TRPM3 in mice (http://www.informatics.jax.org/tissue/marker/MGI:2443101) indicates that in addition to various nerve tissues, this gene is active in the smooth muscle of major arteries and veins, as well as throughout pretty much all of the kidney. Kind of surprised the authors didn't try to connect this to blood pressure/POTS in any way (unless I missed it). Might have made for an interesting subset analysis...

Simon · May 14, 2015

After further digging it appears that, unfortunately, the paper doesn't correct for multiple comparisons after all.

The study used the chi-squared test to compare the frequency of a SNP in patients vs controls, and helpfully it reports the chi-square statistic as well as the p value. You can calculate the p value directly from the this chi-square statistics. For example, for the first SNP in table 1, the Chi-squared statistic (using the appropriate one degree of freedom) gives an uncorrected p value of 0.003. That's exactly the p value reported in the same table, indicated the p value is uncorrected. Which suggests that false positives are likely to be an issue.

That said, it's not clear what the most appropriate correction would be. Although they tested 233 SNPs, that doesn't count as 233 independent tests due to linkage (two SNPs close together on a chromosome are often inhertited together, so if one SNP is significant in the test, its partner will almost certainly be too: effectively this is one test, not two).

Ion Channel SNP Paper

alex3619

Senior Member

Valentijn

Senior Member

adreno

PR activist

user9876

Senior Member

Valentijn

Senior Member

lansbergen

Senior Member

barbc56

Senior Member

nandixon

Senior Member

nandixon

Senior Member

Valentijn

Senior Member

Jonathan Edwards

"Gibberish"

Jonathan Edwards

"Gibberish"

Jonathan Edwards

"Gibberish"

aimossy

Senior Member

Simon

Senior Member

Jonathan Edwards

"Gibberish"

Simon

Senior Member

Jonathan Edwards

"Gibberish"

sdmcvicar

Simon

Senior Member