A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data
Abstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumber...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-07-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2916-0 |
_version_ | 1819082941598269440 |
---|---|
author | Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi |
author_facet | Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi |
author_sort | Anjali Silva |
collection | DOAJ |
description | Abstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. Results A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. Conclusions The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust. |
first_indexed | 2024-12-21T20:24:40Z |
format | Article |
id | doaj.art-ee72178a189a44eabfde4c056f076cdf |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-21T20:24:40Z |
publishDate | 2019-07-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-ee72178a189a44eabfde4c056f076cdf2022-12-21T18:51:24ZengBMCBMC Bioinformatics1471-21052019-07-0120111110.1186/s12859-019-2916-0A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing dataAnjali Silva0Steven J. Rothstein1Paul D. McNicholas2Sanjeena Subedi3Department of Mathematics and Statistics, University of GuelphDepartment of Molecular and Cellular Biology, University of GuelphDepartment of Mathematics and Statistics, McMaster UniversityDepartment of Mathematical Sciences, Binghamton UniversityAbstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. Results A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. Conclusions The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust.http://link.springer.com/article/10.1186/s12859-019-2916-0ClusteringRNA sequencingDiscrete dataMultivariate Poisson-log normal distributionMarkov chain Monte CarloCo-expression networks |
spellingShingle | Anjali Silva Steven J. Rothstein Paul D. McNicholas Sanjeena Subedi A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data BMC Bioinformatics Clustering RNA sequencing Discrete data Multivariate Poisson-log normal distribution Markov chain Monte Carlo Co-expression networks |
title | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_full | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_fullStr | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_full_unstemmed | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_short | A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data |
title_sort | multivariate poisson log normal mixture model for clustering transcriptome sequencing data |
topic | Clustering RNA sequencing Discrete data Multivariate Poisson-log normal distribution Markov chain Monte Carlo Co-expression networks |
url | http://link.springer.com/article/10.1186/s12859-019-2916-0 |
work_keys_str_mv | AT anjalisilva amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT stevenjrothstein amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT pauldmcnicholas amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT sanjeenasubedi amultivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT anjalisilva multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT stevenjrothstein multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT pauldmcnicholas multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata AT sanjeenasubedi multivariatepoissonlognormalmixturemodelforclusteringtranscriptomesequencingdata |