• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of, and finding treatments for, complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia, long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To become a member, simply click the Register button at the top right.

"The biopsychosocial approach: a note of caution" George Davey Smith (2005/2006)

Barry53

Senior Member
Messages
2,391
Location
UK
There might be a certain gene that you really have to have to develop ME. There are some genes like that for MS and AS. You still need huge cohorts to pick them out if you are trawling blind because of the statistical problems of false positives.
So would be something the Big Data approach could work for, to cut through the noise?
 

Woolie

Senior Member
Messages
3,263
I wasn't thinking of the stats directly rather if you build an abstract of the genes and the way they interact with environmental stimuli then I assume this would include a lot of non-linearity which makes predictions hard and unreliable. The test of the stats then comes in terms of whether you can find correlations that you have build into the model given the complexities of other interactions. (I am a believer in testing properties of stats models via simulation under different known stochastic conditions)
I'm probably only half getting what you're saying here, @user9876. Are you talking about isolating the gene in the first place? Or the process of inferring whether its predictive of some other health outcome, different from the one originally studied?
The danger that I see is that you may find a gene and if you have a woolly (with two l's and a y) hypothesis about how a disease may be caused you may well come to think the gene is doing one thing when it is doing another. That was precisely the problem for RA and T cells. So if your thinking is vague enough you can use genetics to support almost any idea. The fact that the gene has to be causally first remains a great strength, but it may still lead you up the garden path.

The only real potential weakness of picking out one gene as far as I know is that you may have linkage disequilibrium. That means that people with HLA-B27 might turn out almost always to have a gene for poor posture by some quirk of DNA structure or evolutionary pressures. We worried a lot about that in the 1990s but it looks as if it hardly ever causes a problem. Modern techniques can home in on a DNA sequence and define precisely where the causal factor is likely to be. That can usually be traced to one gene through one trick or another.
Yea, great explanation, @Jonathan Edwards.

Mendelian randomisation methods seem to make a few assumptions. Some of these are stated upfront, others are more implicit.

1. That there is no direct causal relationship between the gene of interest and the health outcome you're studying. If you isolate a fat gene, and want to use that to study whether being fat causes depression, you have to be fairly sure your fat gene doesn't also directly lead to depression independently of the fat thing. This is what you're getting at, @Jonathan Edwards, right? To be fair, this assumption is always stated upfront in the papers. But it seems you might need to establish ways of actually testing it.

2. That the behavioural or biological outcome you're measuring is fundamentally the same whoever is being measured. So if the form of depression seen in people who have your "depression gene" turns out to be fundamentally different from what is seen in people without this gene, you can't use the method. Your conclusion then would have to be limited to "those with gene123-linked depression". You couldn't make statements that generalise to all people with depression.

With loose psychological concepts like depression, this invariance assumption could easily turn out to be violated. We've already seen from studies of inflammation in depression that there seem to be various different profiles that can earn the label depression; some of these have a lot to do with mood and affect, but others are more to do with fatigue and loss of initiative.

3. That the gene of interest is truly randomly distributed in the population - and not confounded with socioeconomic class or other social factors that could affect health practices. So if you're studying a US population, you'd be in real trouble if the allele you're interested in is more prevalent in people of African descent than in those of European descent. These two groups have different socieconomic profiles, so all you might end up measuring here is the effect of social class on your chosen health outcome.

I'm figuring they control for huge confounds like this. But there may be other more subtle ways these things could operate. What if the allele of interest is unusually prevalent in particular rural communities (like we see in the founder effect)? People with this allele will then be more likely to have other commonalities that affect their health practices - like a rural (perhaps religious) upbringing. Is this what you were getting at @user9876?

Maybe you need to study populations with lots of recent immigration and massive intermixing of gene pools to get around this - maybe places like Israel or possibly Australia?

4. The likelihood of problems is going to amplified if the prevalence of your allele of interest is low overall. If most people have allele 1, and only a tiny fraction of the population have allele 2. The standard mendelian randomsiation methodology stipulates that at least 5% of the population should have the less frequent allele (or is it 10%? I can't remember). But the minimum figure is still pretty low, and conclusions are going to be much more shaky when the allele of interest is very rare.

This is more something for @user9876, but just imagining the maths from a naive point of view. Imagine if 5% of your population have the allele of interest. Even if this gene predicts the the marker/behaviour of interest really well, its never going to be 100%. You could easily end up with results that are driven by 1-2% of your entire sample. So even with a sample of 10,000 participants, you could be relying on key findings from as few, or even fewer than 100 cases.

Its seems to me in this situation, you need some way of checking the broader characteristics of these pivotal cases, to establish they don't differ substantially on any other factors that might affect the outcome of interest.
 
Last edited: