Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes
Summary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matc...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2020-04-01
|
Series: | iScience |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2589004220301723 |
_version_ | 1818327953577082880 |
---|---|
author | Filippo Utro Niina Haiminen Enrico Siragusa Laura-Jayne Gardiner Ed Seabolt Ritesh Krishna James H. Kaufman Laxmi Parida |
author_facet | Filippo Utro Niina Haiminen Enrico Siragusa Laura-Jayne Gardiner Ed Seabolt Ritesh Krishna James H. Kaufman Laxmi Parida |
author_sort | Filippo Utro |
collection | DOAJ |
description | Summary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome. : Microbiology; Microbial Genetics; Bioinformatics Subject Areas: Microbiology, Microbial Genetics, Bioinformatics |
first_indexed | 2024-12-13T12:24:28Z |
format | Article |
id | doaj.art-3dbecda796664f1e947437d45d3d1f3e |
institution | Directory Open Access Journal |
issn | 2589-0042 |
language | English |
last_indexed | 2024-12-13T12:24:28Z |
publishDate | 2020-04-01 |
publisher | Elsevier |
record_format | Article |
series | iScience |
spelling | doaj.art-3dbecda796664f1e947437d45d3d1f3e2022-12-21T23:46:25ZengElsevieriScience2589-00422020-04-01234Hierarchically Labeled Database Indexing Allows Scalable Characterization of MicrobiomesFilippo Utro0Niina Haiminen1Enrico Siragusa2Laura-Jayne Gardiner3Ed Seabolt4Ritesh Krishna5James H. Kaufman6Laxmi Parida7IBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research, The Hartree Centre, Warrington, WA4 4AD, UKIBM Research, Almaden Research Center, San Jose, CA 95120, USAIBM Research, The Hartree Centre, Warrington, WA4 4AD, UKIBM Research, Almaden Research Center, San Jose, CA 95120, USA; Corresponding authorIBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA; Corresponding authorSummary: Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome. : Microbiology; Microbial Genetics; Bioinformatics Subject Areas: Microbiology, Microbial Genetics, Bioinformaticshttp://www.sciencedirect.com/science/article/pii/S2589004220301723 |
spellingShingle | Filippo Utro Niina Haiminen Enrico Siragusa Laura-Jayne Gardiner Ed Seabolt Ritesh Krishna James H. Kaufman Laxmi Parida Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes iScience |
title | Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes |
title_full | Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes |
title_fullStr | Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes |
title_full_unstemmed | Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes |
title_short | Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes |
title_sort | hierarchically labeled database indexing allows scalable characterization of microbiomes |
url | http://www.sciencedirect.com/science/article/pii/S2589004220301723 |
work_keys_str_mv | AT filippoutro hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT niinahaiminen hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT enricosiragusa hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT laurajaynegardiner hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT edseabolt hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT riteshkrishna hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT jameshkaufman hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes AT laxmiparida hierarchicallylabeleddatabaseindexingallowsscalablecharacterizationofmicrobiomes |