Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes

Summary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matc...

Full description

Bibliographic Details
Main Authors: Filippo Utro, Niina Haiminen, Enrico Siragusa, Laura-Jayne Gardiner, Ed Seabolt, Ritesh Krishna, James H. Kaufman, Laxmi Parida
Format: Article
Language:English
Published: Elsevier 2020-04-01
Series:iScience
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004220301723
_version_ 1818327953577082880
author Filippo Utro
Niina Haiminen
Enrico Siragusa
Laura-Jayne Gardiner
Ed Seabolt
Ritesh Krishna
James H. Kaufman
Laxmi Parida
author_facet Filippo Utro
Niina Haiminen
Enrico Siragusa
Laura-Jayne Gardiner
Ed Seabolt
Ritesh Krishna
James H. Kaufman
Laxmi Parida
author_sort Filippo Utro
collection DOAJ
description Summary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome. : Microbiology; Microbial Genetics; Bioinformatics Subject Areas: Microbiology, Microbial Genetics, Bioinformatics
first_indexed 2024-12-13T12:24:28Z
format Article
id doaj.art-3dbecda796664f1e947437d45d3d1f3e
institution Directory Open Access Journal
issn 2589-0042
language English
last_indexed 2024-12-13T12:24:28Z
publishDate 2020-04-01
publisher Elsevier
record_format Article
series iScience
spelling doaj.art-3dbecda796664f1e947437d45d3d1f3e2022-12-21T23:46:25ZengElsevieriScience2589-00422020-04-01234Hierarchically Labeled Database Indexing Allows Scalable Characterization of MicrobiomesFilippo Utro0Niina Haiminen1Enrico Siragusa2Laura-Jayne Gardiner3Ed Seabolt4Ritesh Krishna5James H. Kaufman6Laxmi Parida7IBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, The Hartree Centre, Warrington, WA4 4AD, UKIBM Research, Almaden Research Center, San Jose, CA 95120, USAIBM Research, The Hartree Centre, Warrington, WA4 4AD, UKIBM Research, Almaden Research Center, San Jose, CA 95120, USA; Corresponding authorIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA; Corresponding authorSummary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome. : Microbiology; Microbial Genetics; Bioinformatics Subject Areas: Microbiology, Microbial Genetics, Bioinformaticshttp://www.sciencedirect.com/science/article/pii/S2589004220301723
spellingShingle Filippo Utro
Niina Haiminen
Enrico Siragusa
Laura-Jayne Gardiner
Ed Seabolt
Ritesh Krishna
James H. Kaufman
Laxmi Parida
Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
iScience
title Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_full Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_fullStr Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_full_unstemmed Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_short Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
title_sort hierarchically labeled database indexing allows scalable characterization of microbiomes
url http://www.sciencedirect.com/science/article/pii/S2589004220301723
work_keys_str_mv AT filippoutro hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT niinahaiminen hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT enricosiragusa hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT laurajaynegardiner hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT edseabolt hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT riteshkrishna hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT jameshkaufman hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes
AT laxmiparida hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes