A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data

Abstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumber...

Full description

Bibliographic Details
Main Authors:	Anjali Silva, Steven J. Rothstein, Paul D. McNicholas, Sanjeena Subedi
Format:	Article
Language:	English
Published:	BMC 2019-07-01
Series:	BMC Bioinformatics
Subjects:	Clustering RNA sequencing Discrete data Multivariate Poisson-log normal distribution Markov chain Monte Carlo Co-expression networks
Online Access:	http://link.springer.com/article/10.1186/s12859-019-2916-0

_version_	1819082941598269440
author	Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi
author_facet	Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi
author_sort	Anjali Silva
collection	DOAJ
description	Abstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. Results A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. Conclusions The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust.
first_indexed	2024-12-21T20:24:40Z
format	Article
id	doaj.art-ee72178a189a44eabfde4c056f076cdf
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-21T20:24:40Z
publishDate	2019-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-ee72178a189a44eabfde4c056f076cdf2022-12-21T18:51:24ZengBMCBMC Bioinformatics1471-21052019-07-0120111110.1186/s12859-019-2916-0A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing dataAnjali Silva0Steven J. Rothstein1Paul D. McNicholas2Sanjeena Subedi3Department of Mathematics and Statistics, University of GuelphDepartment of Molecular and Cellular Biology, University of GuelphDepartment of Mathematics and Statistics, McMaster UniversityDepartment of Mathematical Sciences, Binghamton UniversityAbstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. Results A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. Conclusions The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust.http://link.springer.com/article/10.1186/s12859-019-2916-0ClusteringRNA sequencingDiscrete dataMultivariate Poisson-log normal distributionMarkov chain Monte CarloCo-expression networks
spellingShingle	Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data BMC Bioinformatics Clustering RNA sequencing Discrete data Multivariate Poisson-log normal distribution Markov chain Monte Carlo Co-expression networks
title	A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_full	A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_fullStr	A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_full_unstemmed	A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_short	A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
title_sort	multivariate poisson log normal mixture model for clustering transcriptome sequencing data
topic	Clustering RNA sequencing Discrete data Multivariate Poisson-log normal distribution Markov chain Monte Carlo Co-expression networks
url	http://link.springer.com/article/10.1186/s12859-019-2916-0
work_keys_str_mv	AT anjalisilva amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT stevenjrothstein amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT pauldmcnicholas amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT sanjeenasubedi amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT anjalisilva multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT stevenjrothstein multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT pauldmcnicholas multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT sanjeenasubedi multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata

A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data

Similar Items