An improved statistical model for taxonomic assignment of metagenomics

Abstract Background With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple...

Full description

Bibliographic Details
Main Authors: Yujing Yao, Zhezhen Jin, Joseph H Lee
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Genetics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12863-018-0680-1
_version_ 1818216722181652480
author Yujing Yao
Zhezhen Jin
Joseph H Lee
author_facet Yujing Yao
Zhezhen Jin
Joseph H Lee
author_sort Yujing Yao
collection DOAJ
description Abstract Background With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes. Results We extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity. Conclusions TADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.
first_indexed 2024-12-12T06:56:29Z
format Article
id doaj.art-f29fdcd7b2ba46999dfa8db3090753df
institution Directory Open Access Journal
issn 1471-2156
language English
last_indexed 2024-12-12T06:56:29Z
publishDate 2018-10-01
publisher BMC
record_format Article
series BMC Genetics
spelling doaj.art-f29fdcd7b2ba46999dfa8db3090753df2022-12-22T00:33:57ZengBMCBMC Genetics1471-21562018-10-0119111110.1186/s12863-018-0680-1An improved statistical model for taxonomic assignment of metagenomicsYujing Yao0Zhezhen Jin1Joseph H Lee2Department of Biostatistics, Columbia UniversityDepartment of Biostatistics, Columbia UniversitySergievsky Center, Taub Institute, and Departments of Epidemiology and Neurology, Columbia UniversityAbstract Background With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes. Results We extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity. Conclusions TADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.http://link.springer.com/article/10.1186/s12863-018-0680-1EM algorithmMetagenomicsTaxonomic assignment
spellingShingle Yujing Yao
Zhezhen Jin
Joseph H Lee
An improved statistical model for taxonomic assignment of metagenomics
BMC Genetics
EM algorithm
Metagenomics
Taxonomic assignment
title An improved statistical model for taxonomic assignment of metagenomics
title_full An improved statistical model for taxonomic assignment of metagenomics
title_fullStr An improved statistical model for taxonomic assignment of metagenomics
title_full_unstemmed An improved statistical model for taxonomic assignment of metagenomics
title_short An improved statistical model for taxonomic assignment of metagenomics
title_sort improved statistical model for taxonomic assignment of metagenomics
topic EM algorithm
Metagenomics
Taxonomic assignment
url http://link.springer.com/article/10.1186/s12863-018-0680-1
work_keys_str_mv AT yujingyao animprovedstatisticalmodelfortaxonomicassignmentofmetagenomics
AT zhezhenjin animprovedstatisticalmodelfortaxonomicassignmentofmetagenomics
AT josephhlee animprovedstatisticalmodelfortaxonomicassignmentofmetagenomics
AT yujingyao improvedstatisticalmodelfortaxonomicassignmentofmetagenomics
AT zhezhenjin improvedstatisticalmodelfortaxonomicassignmentofmetagenomics
AT josephhlee improvedstatisticalmodelfortaxonomicassignmentofmetagenomics