Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
Summary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the difference...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-05-01
|
Series: | Patterns |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666389922000538 |
_version_ | 1818211382254895104 |
---|---|
author | Xin Bing Tyler Lovelace Florentina Bunea Marten Wegkamp Sudhir Pai Kasturi Harinder Singh Panayiotis V. Benos Jishnu Das |
author_facet | Xin Bing Tyler Lovelace Florentina Bunea Marten Wegkamp Sudhir Pai Kasturi Harinder Singh Panayiotis V. Benos Jishnu Das |
author_sort | Xin Bing |
collection | DOAJ |
description | Summary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation. The bigger picture: Multi-omic technologies for deep cellular and molecular profiling from model organisms or humans have rapidly expanded. However, existing analytical approaches are constrained by the high dimensionality of these datasets, differences in data distributions, and the inability to generate causal inference beyond predictive biomarkers. To address these issues, we developed a novel interpretable machine-learning framework, Essential Regression (ER). ER integrates high-dimensional multi-omic datasets without distributional assumptions regarding the data and identifies significant latent factors and their causal relationships with system-wide outcomes/properties of interest. ER uses higher-order relationships encapsulated in the latent factors, rather than the individual observables, to home in on novel mechanistic insights. Our approach outperforms a range of state-of-the-art methods in terms of prediction and generates novel immunological inferences, consistent with evidence in model organisms. |
first_indexed | 2024-12-12T05:31:37Z |
format | Article |
id | doaj.art-5dd6368ebad64446982ba499f1247b52 |
institution | Directory Open Access Journal |
issn | 2666-3899 |
language | English |
last_indexed | 2024-12-12T05:31:37Z |
publishDate | 2022-05-01 |
publisher | Elsevier |
record_format | Article |
series | Patterns |
spelling | doaj.art-5dd6368ebad64446982ba499f1247b522022-12-22T00:36:17ZengElsevierPatterns2666-38992022-05-0135100473Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasetsXin Bing0Tyler Lovelace1Florentina Bunea2Marten Wegkamp3Sudhir Pai Kasturi4Harinder Singh5Panayiotis V. Benos6Jishnu Das7Department of Statistics and Data Science, Cornell University, Ithaca, NY, USADepartment of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Joint CMU-Pitt PhD Program in Computational Biology, Carnegie Mellon – University of Pittsburgh, Pittsburgh, PA, USADepartment of Statistics and Data Science, Cornell University, Ithaca, NY, USADepartment of Statistics and Data Science, Cornell University, Ithaca, NY, USA; Department of Mathematics, Cornell University, Ithaca, NY, USADivision of Microbiology and Immunology, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USACenter for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorDepartment of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorCenter for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorSummary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation. The bigger picture: Multi-omic technologies for deep cellular and molecular profiling from model organisms or humans have rapidly expanded. However, existing analytical approaches are constrained by the high dimensionality of these datasets, differences in data distributions, and the inability to generate causal inference beyond predictive biomarkers. To address these issues, we developed a novel interpretable machine-learning framework, Essential Regression (ER). ER integrates high-dimensional multi-omic datasets without distributional assumptions regarding the data and identifies significant latent factors and their causal relationships with system-wide outcomes/properties of interest. ER uses higher-order relationships encapsulated in the latent factors, rather than the individual observables, to home in on novel mechanistic insights. Our approach outperforms a range of state-of-the-art methods in terms of prediction and generates novel immunological inferences, consistent with evidence in model organisms.http://www.sciencedirect.com/science/article/pii/S2666389922000538DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
spellingShingle | Xin Bing Tyler Lovelace Florentina Bunea Marten Wegkamp Sudhir Pai Kasturi Harinder Singh Panayiotis V. Benos Jishnu Das Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets Patterns DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
title | Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets |
title_full | Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets |
title_fullStr | Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets |
title_full_unstemmed | Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets |
title_short | Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets |
title_sort | essential regression a generalizable framework for inferring causal latent factors from multi omic datasets |
topic | DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem |
url | http://www.sciencedirect.com/science/article/pii/S2666389922000538 |
work_keys_str_mv | AT xinbing essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT tylerlovelace essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT florentinabunea essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT martenwegkamp essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT sudhirpaikasturi essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT harindersingh essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT panayiotisvbenos essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets AT jishnudas essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets |