There is a connection in what you describe with statistical power and problems George Davey Smith mentioned with small cohorts. Big cohorts are desirable from one standpoint, yet almost certain to cloud issues with respect to causation. Big cohorts also correlate with big funding and centralized control, which has been a real problem for us so far.
I've been thinking about some examples of rare diseases which are likely to show up in ME/CFS cohorts unless you are extremely careful. The
post on Health Rising about joint hypermobility syndrome reminded me of some odd cases I already knew about.
Few people are terribly surprised when a case labeled CFS turns out to be a case of MS or lupus, even if Dr. House might be. ("It's never lupus.") They would likely never imagine that a patient labeled with CFS could have an autoimmune disease hitting IgG4. Such a disease is unlikely to ever reach a TV series, it is simply too rare. If the disease had not been progressive, eventually providing samples for pathologists, I doubt the marker would have been found. There are a lot of things for which we generate antibodies, and in most autoimmune diseases the question of antibody levels is problematic. (I can tell you that the last disease mentioned is now called "IgG4-related disease" because the antibodies are not always reliable biomarkers.)
What I noticed in the JHS case mentioned above was an association with connective tissue disorders. This also happened to turn up in the periodic paralysis patient I know personally, even requiring complicated surgery to correct deformities causing life-threatening sleep apnea. (I'm told most common surgery for sleep apnea introduces problems patients are not warned about in advance, like having food come out your nose. It paid this patient to do research.) He also had joint hypermobility in childhood.
The problem we run into in his case is that the incidence of periodic paralysis (PP) is about 1:100,000. The population of the U.S. is not large enough to put together a cohort of 10,000. What is more, there are forms of the disease which could be mistaken for CFS. If you do the right tests you will have little doubt that you have found a patient with PP. (In the case of my friend, it took 5 hours after a provocative test before he could leave the clinic. There were episodes of doctors screaming at nurses involved when another doctor realized how close they had come to killing him.) The problem is that you will not do the right tests if the doctors involved have never heard of the disease, and have trouble believing in it. ("That's a classic case of somatization leading to catatonia.")
In reading about the splash in the press caused by a paper on suicide in CFS I first thought of the point James Coyne later made, that there was no way to control for depression with the Oxford definition. In that case you could prevent suicide if you treated patients for depression without even knowing about a diagnosis of CFS. A second thought was that all 5 actual suicides might have been included in the cohort as a result of misdiagnosis. Diagnostic error rates don't come close to ruling out 5 out of 2,147.
This is only the tip of an iceberg, because the NHS tends to funnel patients who are diagnostic problems into CFS specialization. Considering the size of the catchment of that clinic we might be looking at a substantial fraction of all the rare chronic diseases without convenient clinical signs in the U.K. Under these circumstances you can't simply rule out diseases with an incidence of 1:100,000 as too improbable to worry about. There are more such diseases than most doctors imagine.
Now I feel a need to address the point you made about models. It is no accident that the convenient markers based on levels of certain substances work best when applied to progressive diseases. If you miss the marker at one time, it will become more prominent. You may find it at autopsy, but it will make itself known. This is not necessarily true in chronic diseases.
What I'm getting at is that even when we look for markers we regularly place the wrong significance on what we see because there are far more biological processes where ratios of rates are important than those strictly responding to concentrations of molecules. Babies are not calibrated like clinical instruments, their organs must be responding to a different kind of signalling. There are plenty to chose from, if anyone in medicine looks at the literature on biological signalling.
Here I want to describe part of my personal history which had dramatic examples of defective mental models far from medicine, some of which nearly killed me in the military.
(Advice: never ask a helicopter pilot what happens if the engine quits while you are flying, unless you enjoy a thrilling demonstration of an autorotation landing. Fortunately, this was tame compared to my regular work -- it didn't even involve explosives.)
Fast forward now to a civilian job creating simulation models for use in training people to perform aircraft maintenance. (Even way back then computers were a lot cheaper than complete jet aircraft, and much less likely to produce fatal consequences.) The first part of designing a simulation is to understand what you are simulating. This means you read manuals, watch operation and go talk to the people who teach the subject. Guess what? Not only do students have defective mental models, you can even find instructors with the problem. You can't code a hand-waving explanation for a computer, except perhaps for amusement. I found a misunderstanding which had actually resulted in what is euphemized to "equipment loss" was still not understood on the flight line. The embarrassing problem was not officially classified, but nobody wanted to talk about it, (another insight.)
We can skip a long series of anecdotes about discrepancies between mental models and reality discovered in a broad range of simulations which reach all the way to the Space Shuttle, and include side trips to subjects very different from aerospace engineering, like the Strategic Petroleum Reserve.
With all this for background behind me, I ran into an explanation from a doctor of how a medication worked. This didn't make sense to me, but then I do not have appropriate training. The Internet was an infant at the time, but I had access to a good academic library. I checked textbooks and reference material, and got different versions of the same thing. My problem was that I could see no way to turn those explanations into an actual model of the process. If it worked at all, the biochemical behavior would be considerably different from what was being described.
There was research material available on pharmacology, so I went to that, and kept finding various versions of hand-waving. Finally, going all the way down to reaction kinetics of the molecules involved, I ran across a paper which said the biological significance of these chemical signals was determined by the ratio of uptake rates at two receptors. That was something I could turn into a model, and would even yield the behavior being described. It might or might not be true of the actual biology, but it was at least feasible to create a model.
More than simply being feasible, this model added considerable sense which had been lacking. It was based on rates rather than levels, and the two receptors were like separate communication channels because molecules which fit one would not fit the other. Since docking of molecules in receptors is a discrete event you could describe this as differential signalling based on pulse repetition rate. Nature was being quite clever because this meant changes in concentration which affected both molecules would have no effect on the biological signal, a kind of noise immunity. The high specificity of receptors also greatly reduced "noise". This kind of signalling also bypassed a problem which had bothered me: how were biological level-sensing mechanisms calibrated? There is nothing naive about biological signalling, the conceptual problem lies on the human end. I have since learned that correctly functioning biological signalling mechanisms are generally much more sophisticated than most people in medicine imagine. This ought to have an impact on endocrinology, immunology and neurology. Many years later I'm getting quite tired of waiting.
The bibliography of that paper told me the concept had already been around for years. Since I was looking at research several years old, I could immediately judge the effect of this breakthrough on the field. The vast bulk of papers afterward went on dealing with the same defective models, tacking on special cases whenever an example which falsified the model appeared. I already knew from other experience that an explosion of special cases was a sign of poor understanding.
Now, it is entirely possible, even likely, for biology to violate human preconceptions about how it ought to work, but in this case there was a model which matched the behavior researchers were describing, as far as I could tell, and it was being ignored. Either the behavior being described was also wrong, and the whole corpus of research nearly worthless, or people were ignoring an internal inconsistency which prevented them from constructing useful models. This continues to this day.
Levels of biomarkers are convenient when you can trust them, but there is no law of nature requiring them. What we have already is anecdotal, but still significant, information about dynamics. It took many years for anyone to actually test patient reports about PEM and reduced capacity. This is a dynamic response to an exercise challenge relating to aerobic thresholds. A great deal of learned nonsense could have been dispensed with if this had been done 30 years ago.
Quite a number of us also show strange responses to a glucose tolerance test. I suspect we have a lot of people with increased sensitivity to insulin. Another weird response concerns steroids, and I am far from alone when I report that I didn't sleep for 48 hours after a shot. Both these relate to the dynamics of glucocorticoid regulation.
We have similar reports of poor regulation of fluids and electrolytes, requiring conscious measures to maintain them. All of these things fall in the category of dysregulation of the HPA axis, but you can plot points along multiple axes for cluster analysis.
Orthostatic intolerance is still another axis, generally indicating poor autonomic control, if there are not fairly specific defects in muscles and joints.
By the time you get through locating patients in the space defined by all these axes I don't think there will be much overlap with the healthy population. The problem is that medical doctors are fairly consistent in avoiding each of these measurements of variation, preferring to think of the changes as "random".
Added: Edited to correct 2,400 to 2,147