A study on fast calling variants from next-generation sequencing data using decision tree

Abstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, re...

Full description

Bibliographic Details
Main Authors: Zhentang Li, Yi Wang, Fei Wang
Format: Article
Language:English
Published: BMC 2018-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2147-9
_version_ 1811215750498615296
author Zhentang Li
Yi Wang
Fei Wang
author_facet Zhentang Li
Yi Wang
Fei Wang
author_sort Zhentang Li
collection DOAJ
description Abstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. Results We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. Conclusions We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.
first_indexed 2024-04-12T06:27:40Z
format Article
id doaj.art-039544d9fe7a454f8aa477852cbff9ac
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T06:27:40Z
publishDate 2018-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-039544d9fe7a454f8aa477852cbff9ac2022-12-22T03:44:06ZengBMCBMC Bioinformatics1471-21052018-04-0119111410.1186/s12859-018-2147-9A study on fast calling variants from next-generation sequencing data using decision treeZhentang Li0Yi Wang1Fei Wang2Shanghai Key Lab of Intelligent Information ProcessingMOE Key Laboratory of Contemporary Anthropology and State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Developmental Biology and School of Life Sciences, Fudan UniversityShanghai Key Lab of Intelligent Information ProcessingAbstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. Results We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. Conclusions We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.http://link.springer.com/article/10.1186/s12859-018-2147-9Next-generation sequencingVariant callingDecision tree
spellingShingle Zhentang Li
Yi Wang
Fei Wang
A study on fast calling variants from next-generation sequencing data using decision tree
BMC Bioinformatics
Next-generation sequencing
Variant calling
Decision tree
title A study on fast calling variants from next-generation sequencing data using decision tree
title_full A study on fast calling variants from next-generation sequencing data using decision tree
title_fullStr A study on fast calling variants from next-generation sequencing data using decision tree
title_full_unstemmed A study on fast calling variants from next-generation sequencing data using decision tree
title_short A study on fast calling variants from next-generation sequencing data using decision tree
title_sort study on fast calling variants from next generation sequencing data using decision tree
topic Next-generation sequencing
Variant calling
Decision tree
url http://link.springer.com/article/10.1186/s12859-018-2147-9
work_keys_str_mv AT zhentangli astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT yiwang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT feiwang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT zhentangli studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT yiwang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT feiwang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree