Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets

Summary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the difference...

Full description

Bibliographic Details
Main Authors: Xin Bing, Tyler Lovelace, Florentina Bunea, Marten Wegkamp, Sudhir Pai Kasturi, Harinder Singh, Panayiotis V. Benos, Jishnu Das
Format: Article
Language:English
Published: Elsevier 2022-05-01
Series:Patterns
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666389922000538
_version_ 1818211382254895104
author Xin Bing
Tyler Lovelace
Florentina Bunea
Marten Wegkamp
Sudhir Pai Kasturi
Harinder Singh
Panayiotis V. Benos
Jishnu Das
author_facet Xin Bing
Tyler Lovelace
Florentina Bunea
Marten Wegkamp
Sudhir Pai Kasturi
Harinder Singh
Panayiotis V. Benos
Jishnu Das
author_sort Xin Bing
collection DOAJ
description Summary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation. The bigger picture: Multi-omic technologies for deep cellular and molecular profiling from model organisms or humans have rapidly expanded. However, existing analytical approaches are constrained by the high dimensionality of these datasets, differences in data distributions, and the inability to generate causal inference beyond predictive biomarkers. To address these issues, we developed a novel interpretable machine-learning framework, Essential Regression (ER). ER integrates high-dimensional multi-omic datasets without distributional assumptions regarding the data and identifies significant latent factors and their causal relationships with system-wide outcomes/properties of interest. ER uses higher-order relationships encapsulated in the latent factors, rather than the individual observables, to home in on novel mechanistic insights. Our approach outperforms a range of state-of-the-art methods in terms of prediction and generates novel immunological inferences, consistent with evidence in model organisms.
first_indexed 2024-12-12T05:31:37Z
format Article
id doaj.art-5dd6368ebad64446982ba499f1247b52
institution Directory Open Access Journal
issn 2666-3899
language English
last_indexed 2024-12-12T05:31:37Z
publishDate 2022-05-01
publisher Elsevier
record_format Article
series Patterns
spelling doaj.art-5dd6368ebad64446982ba499f1247b522022-12-22T00:36:17ZengElsevierPatterns2666-38992022-05-0135100473Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasetsXin Bing0Tyler Lovelace1Florentina Bunea2Marten Wegkamp3Sudhir Pai Kasturi4Harinder Singh5Panayiotis V. Benos6Jishnu Das7Department of Statistics and Data Science, Cornell University, Ithaca, NY, USADepartment of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Joint CMU-Pitt PhD Program in Computational Biology, Carnegie Mellon – University of Pittsburgh, Pittsburgh, PA, USADepartment of Statistics and Data Science, Cornell University, Ithaca, NY, USADepartment of Statistics and Data Science, Cornell University, Ithaca, NY, USA; Department of Mathematics, Cornell University, Ithaca, NY, USADivision of Microbiology and Immunology, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USACenter for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorDepartment of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorCenter for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA; Corresponding authorSummary: High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation. The bigger picture: Multi-omic technologies for deep cellular and molecular profiling from model organisms or humans have rapidly expanded. However, existing analytical approaches are constrained by the high dimensionality of these datasets, differences in data distributions, and the inability to generate causal inference beyond predictive biomarkers. To address these issues, we developed a novel interpretable machine-learning framework, Essential Regression (ER). ER integrates high-dimensional multi-omic datasets without distributional assumptions regarding the data and identifies significant latent factors and their causal relationships with system-wide outcomes/properties of interest. ER uses higher-order relationships encapsulated in the latent factors, rather than the individual observables, to home in on novel mechanistic insights. Our approach outperforms a range of state-of-the-art methods in terms of prediction and generates novel immunological inferences, consistent with evidence in model organisms.http://www.sciencedirect.com/science/article/pii/S2666389922000538DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
spellingShingle Xin Bing
Tyler Lovelace
Florentina Bunea
Marten Wegkamp
Sudhir Pai Kasturi
Harinder Singh
Panayiotis V. Benos
Jishnu Das
Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
Patterns
DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
title Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
title_full Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
title_fullStr Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
title_full_unstemmed Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
title_short Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets
title_sort essential regression a generalizable framework for inferring causal latent factors from multi omic datasets
topic DSML 2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
url http://www.sciencedirect.com/science/article/pii/S2666389922000538
work_keys_str_mv AT xinbing essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT tylerlovelace essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT florentinabunea essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT martenwegkamp essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT sudhirpaikasturi essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT harindersingh essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT panayiotisvbenos essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets
AT jishnudas essentialregressionageneralizableframeworkforinferringcausallatentfactorsfrommultiomicdatasets