The maximum entropy principle for compositional data

Abstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such proc...

Full description

Bibliographic Details
Main Authors: Corey Weistuch, Jiening Zhu, Joseph O. Deasy, Allen R. Tannenbaum
Format: Article
Language:English
Published: BMC 2022-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05007-z
_version_ 1811257049182371840
author Corey Weistuch
Jiening Zhu
Joseph O. Deasy
Allen R. Tannenbaum
author_facet Corey Weistuch
Jiening Zhu
Joseph O. Deasy
Allen R. Tannenbaum
author_sort Corey Weistuch
collection DOAJ
description Abstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such processes emerge from the behaviors of their components and their pairwise interactions. Such a study, however, is challenging for two key reasons. Firstly, such systems are complex and depend, often stochastically, on their constituent parts. Secondly, the data lie on a simplex which influences their correlations. Results To resolve both of these issues, we provide a general and data-driven modeling tool for compositional systems called Compositional Maximum Entropy (CME). By integrating the prior geometric structure of compositions with sample-specific information, CME infers the underlying multivariate relationships between the constituent components. We provide two proofs of principle. First, we measure the relative abundances of different bacteria and infer how they interact. Second, we show that our method outperforms a common alternative for the extraction of gene-gene interactions in triple-negative breast cancer. Conclusions CME provides novel and biologically-intuitive insights and is promising as a comprehensive quantitative framework for compositional data.
first_indexed 2024-04-12T17:51:25Z
format Article
id doaj.art-fe297b185b1b4653a0377abbca1488fd
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T17:51:25Z
publishDate 2022-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-fe297b185b1b4653a0377abbca1488fd2022-12-22T03:22:30ZengBMCBMC Bioinformatics1471-21052022-10-0123111310.1186/s12859-022-05007-zThe maximum entropy principle for compositional dataCorey Weistuch0Jiening Zhu1Joseph O. Deasy2Allen R. Tannenbaum3Department of Medical Physics, Memorial Sloan Kettering Cancer CenterDepartment of Applied Mathematics & Statistics, Stony Brook UniversityDepartment of Medical Physics, Memorial Sloan Kettering Cancer CenterDepartment of Applied Mathematics & Statistics, Stony Brook UniversityAbstract Background Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such processes emerge from the behaviors of their components and their pairwise interactions. Such a study, however, is challenging for two key reasons. Firstly, such systems are complex and depend, often stochastically, on their constituent parts. Secondly, the data lie on a simplex which influences their correlations. Results To resolve both of these issues, we provide a general and data-driven modeling tool for compositional systems called Compositional Maximum Entropy (CME). By integrating the prior geometric structure of compositions with sample-specific information, CME infers the underlying multivariate relationships between the constituent components. We provide two proofs of principle. First, we measure the relative abundances of different bacteria and infer how they interact. Second, we show that our method outperforms a common alternative for the extraction of gene-gene interactions in triple-negative breast cancer. Conclusions CME provides novel and biologically-intuitive insights and is promising as a comprehensive quantitative framework for compositional data.https://doi.org/10.1186/s12859-022-05007-zNetworksCompositional dataInferenceMaximum entropy
spellingShingle Corey Weistuch
Jiening Zhu
Joseph O. Deasy
Allen R. Tannenbaum
The maximum entropy principle for compositional data
BMC Bioinformatics
Networks
Compositional data
Inference
Maximum entropy
title The maximum entropy principle for compositional data
title_full The maximum entropy principle for compositional data
title_fullStr The maximum entropy principle for compositional data
title_full_unstemmed The maximum entropy principle for compositional data
title_short The maximum entropy principle for compositional data
title_sort maximum entropy principle for compositional data
topic Networks
Compositional data
Inference
Maximum entropy
url https://doi.org/10.1186/s12859-022-05007-z
work_keys_str_mv AT coreyweistuch themaximumentropyprincipleforcompositionaldata
AT jieningzhu themaximumentropyprincipleforcompositionaldata
AT josephodeasy themaximumentropyprincipleforcompositionaldata
AT allenrtannenbaum themaximumentropyprincipleforcompositionaldata
AT coreyweistuch maximumentropyprincipleforcompositionaldata
AT jieningzhu maximumentropyprincipleforcompositionaldata
AT josephodeasy maximumentropyprincipleforcompositionaldata
AT allenrtannenbaum maximumentropyprincipleforcompositionaldata