CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by...

Full description

Bibliographic Details
Main Authors:	Lillian R Thistlethwaite, Varduhi Petrosyan, Xiqi Li, Marcus J Miller, Sarah H Elsea, Aleksandar Milosavljevic
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2021-01-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1008550

_version_	1797944031632687104
author	Lillian R Thistlethwaite Varduhi Petrosyan Xiqi Li Marcus J Miller Sarah H Elsea Aleksandar Milosavljevic
author_facet	Lillian R Thistlethwaite Varduhi Petrosyan Xiqi Li Marcus J Miller Sarah H Elsea Aleksandar Milosavljevic
author_sort	Lillian R Thistlethwaite
collection	DOAJ
description	We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.
first_indexed	2024-04-10T20:33:21Z
format	Article
id	doaj.art-d8935c88ca1248f9bd4404aed5f9f659
institution	Directory Open Access Journal
issn	1553-734X 1553-7358
language	English
last_indexed	2024-04-10T20:33:21Z
publishDate	2021-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj.art-d8935c88ca1248f9bd4404aed5f9f6592023-01-25T05:31:57ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-01-01171e100855010.1371/journal.pcbi.1008550CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.Lillian R ThistlethwaiteVarduhi PetrosyanXiqi LiMarcus J MillerSarah H ElseaAleksandar MilosavljevicWe consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.https://doi.org/10.1371/journal.pcbi.1008550
spellingShingle	Lillian R Thistlethwaite Varduhi Petrosyan Xiqi Li Marcus J Miller Sarah H Elsea Aleksandar Milosavljevic CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models. PLoS Computational Biology
title	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_full	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_fullStr	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_full_unstemmed	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_short	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_sort	ctd an information theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models
url	https://doi.org/10.1371/journal.pcbi.1008550
work_keys_str_mv	AT lillianrthistlethwaite ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels AT varduhipetrosyan ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels AT xiqili ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels AT marcusjmiller ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels AT sarahhelsea ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels AT aleksandarmilosavljevic ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels

CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

Similar Items