SWAlexander
Senior Member
- Messages
- 1,483
Abstract
As the biomedical community produces datasets that are increasingly complex and high dimensional, there is a need for more sophisticated computational tools to extract biological insights. We present Multiscale PHATE, a method that sweeps through all levels of data granularity to learn abstracted biological features directly predictive of disease outcome. Built on a coarse-graining process called diffusion condensation, Multiscale PHATE learns a data topology that can be analyzed at coarse resolutions for high-level summarizations of data and at fine resolutions for detailed representations of subsets. We apply Multiscale PHATE to a coronavirus disease 2019 (COVID-19) dataset with 54 million cells from 168 hospitalized patients and find that patients who die show CD16hiCD66blo neutrophil and IFN-γ+ granzyme B+ Th17 cell responses. We also show that population groupings from Multiscale PHATE directly fed into a classifier predict disease outcome more accurately than naive featurizations of the data. Multiscale PHATE is broadly generalizable to different data types, including flow cytometry, single-cell RNA sequencing (scRNA-seq), single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), and clinical variables.
We analyzed 208 patient samples using a flow cytometry panel containing markers indicative of T cell subset identity and activation status. After identifying the ideal granularity to analyze the data, we identified CD4+, CD8+ and double-positive T cell subsets (Extended Data Fig, 8c); we zoomed into the CD8+ subset and identified a range of activation states based on the expression of key markers (Extended Data Fig. 8d). After computing the MELD mortality likelihood score, we identified that the T Effector Memory re-expressing CD45RA (TEMRA) population displayed the most enrichment in severe infection. Furthermore, across all CD8+ T cells, the activation state markers PD1, TIM3, HLA-DR and CD45RA were also positively correlated with mortality on DREMI analysis (Extended Data Fig. 8e). In agreement with another study of patients with SARS-CoV-2 (ref. 32), we found a hyperactivated CD8+ T cell response in the form of CD8+CD45RA+TIM3+HLA-DR+PD1+ TEMRA cells likely expressing granzyme B that were correlated with disease lethality.
https://www.nature.com/articles/s41587-021-01186-x
As the biomedical community produces datasets that are increasingly complex and high dimensional, there is a need for more sophisticated computational tools to extract biological insights. We present Multiscale PHATE, a method that sweeps through all levels of data granularity to learn abstracted biological features directly predictive of disease outcome. Built on a coarse-graining process called diffusion condensation, Multiscale PHATE learns a data topology that can be analyzed at coarse resolutions for high-level summarizations of data and at fine resolutions for detailed representations of subsets. We apply Multiscale PHATE to a coronavirus disease 2019 (COVID-19) dataset with 54 million cells from 168 hospitalized patients and find that patients who die show CD16hiCD66blo neutrophil and IFN-γ+ granzyme B+ Th17 cell responses. We also show that population groupings from Multiscale PHATE directly fed into a classifier predict disease outcome more accurately than naive featurizations of the data. Multiscale PHATE is broadly generalizable to different data types, including flow cytometry, single-cell RNA sequencing (scRNA-seq), single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), and clinical variables.
We analyzed 208 patient samples using a flow cytometry panel containing markers indicative of T cell subset identity and activation status. After identifying the ideal granularity to analyze the data, we identified CD4+, CD8+ and double-positive T cell subsets (Extended Data Fig, 8c); we zoomed into the CD8+ subset and identified a range of activation states based on the expression of key markers (Extended Data Fig. 8d). After computing the MELD mortality likelihood score, we identified that the T Effector Memory re-expressing CD45RA (TEMRA) population displayed the most enrichment in severe infection. Furthermore, across all CD8+ T cells, the activation state markers PD1, TIM3, HLA-DR and CD45RA were also positively correlated with mortality on DREMI analysis (Extended Data Fig. 8e). In agreement with another study of patients with SARS-CoV-2 (ref. 32), we found a hyperactivated CD8+ T cell response in the form of CD8+CD45RA+TIM3+HLA-DR+PD1+ TEMRA cells likely expressing granzyme B that were correlated with disease lethality.
https://www.nature.com/articles/s41587-021-01186-x