Recently there's been more news again of biomarkers claiming certain accuracies amongst patient groups of Long-Covid and ME/CFS patients and from my personal point of views there are usually several problems amongst these studies and the claims they make.
The good studies that make these claims have large sample sizes and validate their findings on subsets which didn't belong to the test data, these studies are extremely rare. Below I will explain why I still believe many of these studies are often sub-optimal and why they should often also apply other classification tools with different goals as well. Furthermore I will detail why specific attention to different onsets should be accounted for in the selection criteria as well as the data. Otherwise reproducibility might always remain a problem.
In short one can say: It's very hard to accurately distinguish between something you don't understand at all, i.e. ME/CFS, and something you hardly understand at all, i.e. a supposedly healthy human body, even if you account for different test and sample sets.
If we look at the Intramural study, which already had the strictest selection criteria of any ME/CFS study ever conducted, with extremely intricate examinations, even in that case it eventually turned out that 3 people didn't have ME/CFS. Statistically this would very roughly mean that you should always expect that at least 15% of people in ME/CFS studies don't actually have ME/CFS (one can argue that the sample size was to small to make this argument, however given the extensive pre-examinations and inclusion criteria it should be more than correct). Furthermore without knowing how ME/CFS works, we should definitely expect that not everything that is nowadays considered to be ME/CFS will be considered to be ME/CFS in the future and instead some diseases not even existing today with different markers might appear. Finally there might exist different subcategories of ME/CFS, for example between different viral onsets EBV vs HHV-6 vs MERS vs SARS-COV-2 and biomarkers could possibly be different for different onsets. Not once have I seen a study that accounts in its data set the different disease onsets.
The current work also seems to more and more rely blindly on machine learning classifiers that researchers often utilize in a way that instead of trying to find maximal differences between specific markers they want to maximise the accuracy of distinguishability between cohorts, i.e. ME/CFS vs HC or LC vs HC. In essence this means the are often looking for biomarker-tests where the difference in specific markers might be very small, but if we group some sets of markers together we get something that applies to everyone that was part of the study. Retrospectively after all classifiers have been tested one then chooses the one classifier that had the highest accuracy amongst all classifiers.
For me this shouldn't always be the main objective. I'd rather have a biomarker that only distinguishes something as low as 70% of cases of ME/CFS, but when it does so, the specific markers differ immensly compared to HC and some other fatiguing diseases. I'm yet to read papers that try to solve such maximisation problems.
Furthermore machine learning classifiers often lack any sort of interpretability. That means if you try to find out why the algorithm thinks that one marker should be prioritised above another one you simply don't know why. In your follow-up study it may now be a marginally different marker making your results irreproducible. As such classical statistical classification methods should be applied as well. Finally blindly applying different machine learning classifiers and then retrospectively choosing the one with the highest accuracy has the notion that what you're doing is solving a maximisation problem amongst classifiers rather than a maximisation problem amongst specific markers. These two problems don't necessarily have to be equivalent in small sample sizes and where the diagnosis of the disease won't be 100% accurate. This can cause problems if you try to reproduce your results in follow-up studies as it might turn out that a different classifier is optimal in different studies. Of course this could also be the case if you would have tried to maximise differences in certain marker values rather than maximise the accuracy of your test, however in that case you might have at least learned something about the disease, which might not be possible if your results lack interpretability. I'm by no means bashing machine learning models, we should just understand how to use them to solve the problem that we actually want to solve. The data should always be made available open source for re-analysis, so that the above problems are minimised.
To maximise chances of finding a biomarker for everyone, not only should the diagnostic criteria and disease severity be as strict as possible in trials, meaning possibly even stricter criteria than CCC+extensive exclusion of everything else, they should also consider very specific viral onsets as well or at least account for these by adding them to the data, especially in your classification models.
In this sense Covid-ME/CFS brings the ideal group of patients for studying which can be longitudinally studied from illness onset and compared to other ME/CFS groups with variying illness durations, such as patients that have ME/CFS from MERS or some other condition (MERS could be particularly good in this case, being from the same viral family). The RECOVER Long-Covid symptom study received a lot of hate (some of it being justified), but if you have a patient that meets the CCC+exclusion criteria and which has a very specific Covid onset which is further proven by a PCR-test and also has a loss of smell, then you can be pretty sure that he has ME/CFS (or at least Long-Covid-ME/CFS which after all could still be different from other forms of ME/CFS and even LC-ME/CFS might not always be the same for example EBV vs HHV-6 reactivation etc). In that sense the RECOVER study was great for study selection criteria and it seems many didn't understand that, probably because study was too open for misinterpretation by the media.
At the end of the day very severe ME/CFS patients are extremely sick. Possibly, sicker than patients from any other disease that exists in this world. This means that somewhere in their body things are going awfully wrong and some day in the future a marker will be found for this. Till then have to try to maximise our chances to find markers that are conclusive enough that we can work our way upstream to the "real problems". Even if these might initially only apply to a subset of patients. At the end of the day you should try to do thing as strict as possible, with the harshest possible set of patients, to hopefully get more conclusive results, which in turn allow you to understand the disease better, helping everybody. I believe Scheibenbogen and at the team at UCSF are both sort of working in this direction.
As such statements along the lines of "100% accurate biomarker for ME/CFS" don't seem very credible at all to me and if not compared to other diseases, probably just measure the inactivity of people in some way, possibly even indirectly.
The good studies that make these claims have large sample sizes and validate their findings on subsets which didn't belong to the test data, these studies are extremely rare. Below I will explain why I still believe many of these studies are often sub-optimal and why they should often also apply other classification tools with different goals as well. Furthermore I will detail why specific attention to different onsets should be accounted for in the selection criteria as well as the data. Otherwise reproducibility might always remain a problem.
In short one can say: It's very hard to accurately distinguish between something you don't understand at all, i.e. ME/CFS, and something you hardly understand at all, i.e. a supposedly healthy human body, even if you account for different test and sample sets.
If we look at the Intramural study, which already had the strictest selection criteria of any ME/CFS study ever conducted, with extremely intricate examinations, even in that case it eventually turned out that 3 people didn't have ME/CFS. Statistically this would very roughly mean that you should always expect that at least 15% of people in ME/CFS studies don't actually have ME/CFS (one can argue that the sample size was to small to make this argument, however given the extensive pre-examinations and inclusion criteria it should be more than correct). Furthermore without knowing how ME/CFS works, we should definitely expect that not everything that is nowadays considered to be ME/CFS will be considered to be ME/CFS in the future and instead some diseases not even existing today with different markers might appear. Finally there might exist different subcategories of ME/CFS, for example between different viral onsets EBV vs HHV-6 vs MERS vs SARS-COV-2 and biomarkers could possibly be different for different onsets. Not once have I seen a study that accounts in its data set the different disease onsets.
The current work also seems to more and more rely blindly on machine learning classifiers that researchers often utilize in a way that instead of trying to find maximal differences between specific markers they want to maximise the accuracy of distinguishability between cohorts, i.e. ME/CFS vs HC or LC vs HC. In essence this means the are often looking for biomarker-tests where the difference in specific markers might be very small, but if we group some sets of markers together we get something that applies to everyone that was part of the study. Retrospectively after all classifiers have been tested one then chooses the one classifier that had the highest accuracy amongst all classifiers.
For me this shouldn't always be the main objective. I'd rather have a biomarker that only distinguishes something as low as 70% of cases of ME/CFS, but when it does so, the specific markers differ immensly compared to HC and some other fatiguing diseases. I'm yet to read papers that try to solve such maximisation problems.
Furthermore machine learning classifiers often lack any sort of interpretability. That means if you try to find out why the algorithm thinks that one marker should be prioritised above another one you simply don't know why. In your follow-up study it may now be a marginally different marker making your results irreproducible. As such classical statistical classification methods should be applied as well. Finally blindly applying different machine learning classifiers and then retrospectively choosing the one with the highest accuracy has the notion that what you're doing is solving a maximisation problem amongst classifiers rather than a maximisation problem amongst specific markers. These two problems don't necessarily have to be equivalent in small sample sizes and where the diagnosis of the disease won't be 100% accurate. This can cause problems if you try to reproduce your results in follow-up studies as it might turn out that a different classifier is optimal in different studies. Of course this could also be the case if you would have tried to maximise differences in certain marker values rather than maximise the accuracy of your test, however in that case you might have at least learned something about the disease, which might not be possible if your results lack interpretability. I'm by no means bashing machine learning models, we should just understand how to use them to solve the problem that we actually want to solve. The data should always be made available open source for re-analysis, so that the above problems are minimised.
To maximise chances of finding a biomarker for everyone, not only should the diagnostic criteria and disease severity be as strict as possible in trials, meaning possibly even stricter criteria than CCC+extensive exclusion of everything else, they should also consider very specific viral onsets as well or at least account for these by adding them to the data, especially in your classification models.
In this sense Covid-ME/CFS brings the ideal group of patients for studying which can be longitudinally studied from illness onset and compared to other ME/CFS groups with variying illness durations, such as patients that have ME/CFS from MERS or some other condition (MERS could be particularly good in this case, being from the same viral family). The RECOVER Long-Covid symptom study received a lot of hate (some of it being justified), but if you have a patient that meets the CCC+exclusion criteria and which has a very specific Covid onset which is further proven by a PCR-test and also has a loss of smell, then you can be pretty sure that he has ME/CFS (or at least Long-Covid-ME/CFS which after all could still be different from other forms of ME/CFS and even LC-ME/CFS might not always be the same for example EBV vs HHV-6 reactivation etc). In that sense the RECOVER study was great for study selection criteria and it seems many didn't understand that, probably because study was too open for misinterpretation by the media.
At the end of the day very severe ME/CFS patients are extremely sick. Possibly, sicker than patients from any other disease that exists in this world. This means that somewhere in their body things are going awfully wrong and some day in the future a marker will be found for this. Till then have to try to maximise our chances to find markers that are conclusive enough that we can work our way upstream to the "real problems". Even if these might initially only apply to a subset of patients. At the end of the day you should try to do thing as strict as possible, with the harshest possible set of patients, to hopefully get more conclusive results, which in turn allow you to understand the disease better, helping everybody. I believe Scheibenbogen and at the team at UCSF are both sort of working in this direction.
As such statements along the lines of "100% accurate biomarker for ME/CFS" don't seem very credible at all to me and if not compared to other diseases, probably just measure the inactivity of people in some way, possibly even indirectly.
Last edited: