Metagenomic binning through low-density hashing

Motivation: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce low-density...

Full description

Bibliographic Details
Main Authors: Luo, Yunan, Yu, Yun William, Zeng, Jianyang, Berger Leighton, Bonnie, Peng, Jian
Other Authors: Massachusetts Institute of Technology. Department of Mathematics
Format: Article
Language:English
Published: Oxford University Press (OUP) 2019
Online Access:https://hdl.handle.net/1721.1/122806
_version_ 1826200120145739776
author Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger Leighton, Bonnie
Peng, Jian
author2 Massachusetts Institute of Technology. Department of Mathematics
author_facet Massachusetts Institute of Technology. Department of Mathematics
Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger Leighton, Bonnie
Peng, Jian
author_sort Luo, Yunan
collection MIT
description Motivation: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce low-density' locality sensitive hashing to bioinformatics, with the addition of Gallager codes for even coverage, enabling quick and accurate metagenomic binning. Results: On public benchmarks, Opal halves the error on precision/recall (F1-score) as compared with both alignment-based and alignment-free methods for species classification. We demonstrate even more marked improvement at higher taxonomic levels, allowing for the discovery of novel lineages. Furthermore, the innovation of low-density, even-coverage hashing should itself prove an essential methodological advance as it enables the application of machine learning to other bioinformatic challenges. Availability and implementation: Full source code and datasets are available at http://opal.csail.mit.edu and https://github.com/yunwilliamyu/opal. Supplementary information: Supplementary data are available at Bioinformatics online.
first_indexed 2024-09-23T11:31:30Z
format Article
id mit-1721.1/122806
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T11:31:30Z
publishDate 2019
publisher Oxford University Press (OUP)
record_format dspace
spelling mit-1721.1/1228062022-09-27T20:06:14Z Metagenomic binning through low-density hashing Luo, Yunan Yu, Yun William Zeng, Jianyang Berger Leighton, Bonnie Peng, Jian Massachusetts Institute of Technology. Department of Mathematics Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Motivation: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce low-density' locality sensitive hashing to bioinformatics, with the addition of Gallager codes for even coverage, enabling quick and accurate metagenomic binning. Results: On public benchmarks, Opal halves the error on precision/recall (F1-score) as compared with both alignment-based and alignment-free methods for species classification. We demonstrate even more marked improvement at higher taxonomic levels, allowing for the discovery of novel lineages. Furthermore, the innovation of low-density, even-coverage hashing should itself prove an essential methodological advance as it enables the application of machine learning to other bioinformatic challenges. Availability and implementation: Full source code and datasets are available at http://opal.csail.mit.edu and https://github.com/yunwilliamyu/opal. Supplementary information: Supplementary data are available at Bioinformatics online. National Institutes of Health (U.S.) (Grant GM108348) 2019-11-08T18:08:50Z 2019-11-08T18:08:50Z 2018-07-13 2018-06 2019-11-07T19:05:01Z Article http://purl.org/eprint/type/JournalArticle 1367-4803 1460-2059 https://hdl.handle.net/1721.1/122806 Lou, Yunan, et al. "Metagenomic binning through low-density hashing." Bioinformatics 35, 2, (January 2019): 219–226 © 2018 The Author(s) en http://dx.doi.org/10.1093/bioinformatics/bty611 Bioinformatics Creative Commons Attribution NonCommercial License 4.0 https://creativecommons.org/licenses/by-nc/4.0/ application/pdf Oxford University Press (OUP) Oxford University Press
spellingShingle Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger Leighton, Bonnie
Peng, Jian
Metagenomic binning through low-density hashing
title Metagenomic binning through low-density hashing
title_full Metagenomic binning through low-density hashing
title_fullStr Metagenomic binning through low-density hashing
title_full_unstemmed Metagenomic binning through low-density hashing
title_short Metagenomic binning through low-density hashing
title_sort metagenomic binning through low density hashing
url https://hdl.handle.net/1721.1/122806
work_keys_str_mv AT luoyunan metagenomicbinningthroughlowdensityhashing
AT yuyunwilliam metagenomicbinningthroughlowdensityhashing
AT zengjianyang metagenomicbinningthroughlowdensityhashing
AT bergerleightonbonnie metagenomicbinningthroughlowdensityhashing
AT pengjian metagenomicbinningthroughlowdensityhashing