The maximum entropy principle for compositional data
Abstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such proc...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-10-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-022-05007-z |
_version_ | 1811257049182371840 |
---|---|
author | Corey Weistuch Jiening Zhu Joseph O. Deasy Allen R. Tannenbaum |
author_facet | Corey Weistuch Jiening Zhu Joseph O. Deasy Allen R. Tannenbaum |
author_sort | Corey Weistuch |
collection | DOAJ |
description | Abstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such processes emerge from the behaviors of their components and their pairwise interactions. Such a study, however, is challenging for two key reasons. Firstly, such systems are complex and depend, often stochastically, on their constituent parts. Secondly, the data lie on a simplex which influences their correlations. Results To resolve both of these issues, we provide a general and data-driven modeling tool for compositional systems called Compositional Maximum Entropy (CME). By integrating the prior geometric structure of compositions with sample-specific information, CME infers the underlying multivariate relationships between the constituent components. We provide two proofs of principle. First, we measure the relative abundances of different bacteria and infer how they interact. Second, we show that our method outperforms a common alternative for the extraction of gene-gene interactions in triple-negative breast cancer. Conclusions CME provides novel and biologically-intuitive insights and is promising as a comprehensive quantitative framework for compositional data. |
first_indexed | 2024-04-12T17:51:25Z |
format | Article |
id | doaj.art-fe297b185b1b4653a0377abbca1488fd |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-12T17:51:25Z |
publishDate | 2022-10-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-fe297b185b1b4653a0377abbca1488fd2022-12-22T03:22:30ZengBMCBMC Bioinformatics1471-21052022-10-0123111310.1186/s12859-022-05007-zThe maximum entropy principle for compositional dataCorey Weistuch0Jiening Zhu1Joseph O. Deasy2Allen R. Tannenbaum3Department of Medical Physics, Memorial Sloan Kettering Cancer CenterDepartment of Applied Mathematics & Statistics, Stony Brook UniversityDepartment of Medical Physics, Memorial Sloan Kettering Cancer CenterDepartment of Applied Mathematics & Statistics, Stony Brook UniversityAbstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such processes emerge from the behaviors of their components and their pairwise interactions. Such a study, however, is challenging for two key reasons. Firstly, such systems are complex and depend, often stochastically, on their constituent parts. Secondly, the data lie on a simplex which influences their correlations. Results To resolve both of these issues, we provide a general and data-driven modeling tool for compositional systems called Compositional Maximum Entropy (CME). By integrating the prior geometric structure of compositions with sample-specific information, CME infers the underlying multivariate relationships between the constituent components. We provide two proofs of principle. First, we measure the relative abundances of different bacteria and infer how they interact. Second, we show that our method outperforms a common alternative for the extraction of gene-gene interactions in triple-negative breast cancer. Conclusions CME provides novel and biologically-intuitive insights and is promising as a comprehensive quantitative framework for compositional data.https://doi.org/10.1186/s12859-022-05007-zNetworksCompositional dataInferenceMaximum entropy |
spellingShingle | Corey Weistuch Jiening Zhu Joseph O. Deasy Allen R. Tannenbaum The maximum entropy principle for compositional data BMC Bioinformatics Networks Compositional data Inference Maximum entropy |
title | The maximum entropy principle for compositional data |
title_full | The maximum entropy principle for compositional data |
title_fullStr | The maximum entropy principle for compositional data |
title_full_unstemmed | The maximum entropy principle for compositional data |
title_short | The maximum entropy principle for compositional data |
title_sort | maximum entropy principle for compositional data |
topic | Networks Compositional data Inference Maximum entropy |
url | https://doi.org/10.1186/s12859-022-05007-z |
work_keys_str_mv | AT coreyweistuch themaximumentropyprincipleforcompositionaldata AT jieningzhu themaximumentropyprincipleforcompositionaldata AT josephodeasy themaximumentropyprincipleforcompositionaldata AT allenrtannenbaum themaximumentropyprincipleforcompositionaldata AT coreyweistuch maximumentropyprincipleforcompositionaldata AT jieningzhu maximumentropyprincipleforcompositionaldata AT josephodeasy maximumentropyprincipleforcompositionaldata AT allenrtannenbaum maximumentropyprincipleforcompositionaldata |