Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

Abstract Background Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for inser...

Full description

Bibliographic Details
Main Authors: Siddharth Subramaniyam, Michael A. DeJesus, Anisha Zaveri, Clare M. Smith, Richard E. Baker, Sabine Ehrt, Dirk Schnappinger, Christopher M. Sassetti, Thomas R. Ioerger
Format: Article
Language:English
Published: BMC 2019-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-3156-z
_version_ 1818553647588442112
author Siddharth Subramaniyam
Michael A. DeJesus
Anisha Zaveri
Clare M. Smith
Richard E. Baker
Sabine Ehrt
Dirk Schnappinger
Christopher M. Sassetti
Thomas R. Ioerger
author_facet Siddharth Subramaniyam
Michael A. DeJesus
Anisha Zaveri
Clare M. Smith
Richard E. Baker
Sabine Ehrt
Dirk Schnappinger
Christopher M. Sassetti
Thomas R. Ioerger
author_sort Siddharth Subramaniyam
collection DOAJ
description Abstract Background Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. Results In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. Conclusions Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.
first_indexed 2024-12-12T09:28:33Z
format Article
id doaj.art-c80d20f8c434430692d7e045fc78bfc1
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T09:28:33Z
publishDate 2019-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-c80d20f8c434430692d7e045fc78bfc12022-12-22T00:28:57ZengBMCBMC Bioinformatics1471-21052019-11-0120111510.1186/s12859-019-3156-zStatistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regressionSiddharth Subramaniyam0Michael A. DeJesus1Anisha Zaveri2Clare M. Smith3Richard E. Baker4Sabine Ehrt5Dirk Schnappinger6Christopher M. Sassetti7Thomas R. Ioerger8Department of Computer Science & Engineering, Texas A&M UniveristyRockefeller UniversityDepartment of Microbiology & Immunology, Weill Cornell Medical CollegeDepartment of Microbiology & Physiological Systems, University of Massachusetts Medical SchoolDepartment of Microbiology & Physiological Systems, University of Massachusetts Medical SchoolDepartment of Microbiology & Immunology, Weill Cornell Medical CollegeDepartment of Microbiology & Immunology, Weill Cornell Medical CollegeDepartment of Microbiology & Physiological Systems, University of Massachusetts Medical SchoolDepartment of Computer Science & Engineering, Texas A&M UniveristyAbstract Background Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. Results In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. Conclusions Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.http://link.springer.com/article/10.1186/s12859-019-3156-zTnSeqTransposon insertion libraryEssentialityZero-inflated negative binomial distributionMycobacterium tuberculosis
spellingShingle Siddharth Subramaniyam
Michael A. DeJesus
Anisha Zaveri
Clare M. Smith
Richard E. Baker
Sabine Ehrt
Dirk Schnappinger
Christopher M. Sassetti
Thomas R. Ioerger
Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
BMC Bioinformatics
TnSeq
Transposon insertion library
Essentiality
Zero-inflated negative binomial distribution
Mycobacterium tuberculosis
title Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_full Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_fullStr Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_full_unstemmed Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_short Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_sort statistical analysis of variability in tnseq data across conditions using zero inflated negative binomial regression
topic TnSeq
Transposon insertion library
Essentiality
Zero-inflated negative binomial distribution
Mycobacterium tuberculosis
url http://link.springer.com/article/10.1186/s12859-019-3156-z
work_keys_str_mv AT siddharthsubramaniyam statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT michaeladejesus statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT anishazaveri statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT claremsmith statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT richardebaker statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT sabineehrt statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT dirkschnappinger statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT christophermsassetti statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT thomasrioerger statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression