A big data approach to metagenomics for all-food-sequencing

Abstract Background All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires...

Full description

Bibliographic Details
Main Authors:	Robin Kobus, José M. Abuín, André Müller, Sören Lukas Hellmann, Juan C. Pichel, Tomás F. Pena, Andreas Hildebrandt, Thomas Hankeln, Bertil Schmidt
Format:	Article
Language:	English
Published:	BMC 2020-03-01
Series:	BMC Bioinformatics
Subjects:	Next-generation sequencing Metagenomics Species identification Eukaryotic genomes Locality sensitive hashing Big data
Online Access:	http://link.springer.com/article/10.1186/s12859-020-3429-6

_version_	1830382386777423872
author	Robin Kobus José M. Abuín André Müller Sören Lukas Hellmann Juan C. Pichel Tomás F. Pena Andreas Hildebrandt Thomas Hankeln Bertil Schmidt
author_facet	Robin Kobus José M. Abuín André Müller Sören Lukas Hellmann Juan C. Pichel Tomás F. Pena Andreas Hildebrandt Thomas Hankeln Bertil Schmidt
author_sort	Robin Kobus
collection	DOAJ
description	Abstract Background All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches. Results We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark). Conclusions We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters).
first_indexed	2024-12-20T10:04:36Z
format	Article
id	doaj.art-52b8dda9f3684cd6a38d2bbc7c876b0f
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-20T10:04:36Z
publishDate	2020-03-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-52b8dda9f3684cd6a38d2bbc7c876b0f2022-12-21T19:44:15ZengBMCBMC Bioinformatics1471-21052020-03-0121111510.1186/s12859-020-3429-6A big data approach to metagenomics for all-food-sequencingRobin Kobus0José M. Abuín1André Müller2Sören Lukas Hellmann3Juan C. Pichel4Tomás F. Pena5Andreas Hildebrandt6Thomas Hankeln7Bertil Schmidt8Department of Computer Science, Johannes Gutenberg UniversityIPCA, Polytechnic Institute of Cávado and AveDepartment of Computer Science, Johannes Gutenberg UniversityMolecular Genetics and Genome Analysis, Institute of Organismal and Molecular Evolution, Johannes Gutenberg UniversityCiTIUS, Universidade de Santiago de CompostelaCiTIUS, Universidade de Santiago de CompostelaDepartment of Computer Science, Johannes Gutenberg UniversityMolecular Genetics and Genome Analysis, Institute of Organismal and Molecular Evolution, Johannes Gutenberg UniversityDepartment of Computer Science, Johannes Gutenberg UniversityAbstract Background All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches. Results We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark). Conclusions We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters).http://link.springer.com/article/10.1186/s12859-020-3429-6Next-generation sequencingMetagenomicsSpecies identificationEukaryotic genomesLocality sensitive hashingBig data
spellingShingle	Robin Kobus José M. Abuín André Müller Sören Lukas Hellmann Juan C. Pichel Tomás F. Pena Andreas Hildebrandt Thomas Hankeln Bertil Schmidt A big data approach to metagenomics for all-food-sequencing BMC Bioinformatics Next-generation sequencing Metagenomics Species identification Eukaryotic genomes Locality sensitive hashing Big data
title	A big data approach to metagenomics for all-food-sequencing
title_full	A big data approach to metagenomics for all-food-sequencing
title_fullStr	A big data approach to metagenomics for all-food-sequencing
title_full_unstemmed	A big data approach to metagenomics for all-food-sequencing
title_short	A big data approach to metagenomics for all-food-sequencing
title_sort	big data approach to metagenomics for all food sequencing
topic	Next-generation sequencing Metagenomics Species identification Eukaryotic genomes Locality sensitive hashing Big data
url	http://link.springer.com/article/10.1186/s12859-020-3429-6
work_keys_str_mv	AT robinkobus abigdataapproachtometagenomicsforallfoodsequencing AT josemabuin abigdataapproachtometagenomicsforallfoodsequencing AT andremuller abigdataapproachtometagenomicsforallfoodsequencing AT sorenlukashellmann abigdataapproachtometagenomicsforallfoodsequencing AT juancpichel abigdataapproachtometagenomicsforallfoodsequencing AT tomasfpena abigdataapproachtometagenomicsforallfoodsequencing AT andreashildebrandt abigdataapproachtometagenomicsforallfoodsequencing AT thomashankeln abigdataapproachtometagenomicsforallfoodsequencing AT bertilschmidt abigdataapproachtometagenomicsforallfoodsequencing AT robinkobus bigdataapproachtometagenomicsforallfoodsequencing AT josemabuin bigdataapproachtometagenomicsforallfoodsequencing AT andremuller bigdataapproachtometagenomicsforallfoodsequencing AT sorenlukashellmann bigdataapproachtometagenomicsforallfoodsequencing AT juancpichel bigdataapproachtometagenomicsforallfoodsequencing AT tomasfpena bigdataapproachtometagenomicsforallfoodsequencing AT andreashildebrandt bigdataapproachtometagenomicsforallfoodsequencing AT thomashankeln bigdataapproachtometagenomicsforallfoodsequencing AT bertilschmidt bigdataapproachtometagenomicsforallfoodsequencing

A big data approach to metagenomics for all-food-sequencing

Similar Items