A study on fast calling variants from next-generation sequencing data using decision tree
Abstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, re...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-04-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2147-9 |
_version_ | 1811215750498615296 |
---|---|
author | Zhentang Li Yi Wang Fei Wang |
author_facet | Zhentang Li Yi Wang Fei Wang |
author_sort | Zhentang Li |
collection | DOAJ |
description | Abstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. Results We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. Conclusions We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP. |
first_indexed | 2024-04-12T06:27:40Z |
format | Article |
id | doaj.art-039544d9fe7a454f8aa477852cbff9ac |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-12T06:27:40Z |
publishDate | 2018-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-039544d9fe7a454f8aa477852cbff9ac2022-12-22T03:44:06ZengBMCBMC Bioinformatics1471-21052018-04-0119111410.1186/s12859-018-2147-9A study on fast calling variants from next-generation sequencing data using decision treeZhentang Li0Yi Wang1Fei Wang2Shanghai Key Lab of Intelligent Information ProcessingMOE Key Laboratory of Contemporary Anthropology and State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Developmental Biology and School of Life Sciences, Fudan UniversityShanghai Key Lab of Intelligent Information ProcessingAbstract Background The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. Results We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. Conclusions We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.http://link.springer.com/article/10.1186/s12859-018-2147-9Next-generation sequencingVariant callingDecision tree |
spellingShingle | Zhentang Li Yi Wang Fei Wang A study on fast calling variants from next-generation sequencing data using decision tree BMC Bioinformatics Next-generation sequencing Variant calling Decision tree |
title | A study on fast calling variants from next-generation sequencing data using decision tree |
title_full | A study on fast calling variants from next-generation sequencing data using decision tree |
title_fullStr | A study on fast calling variants from next-generation sequencing data using decision tree |
title_full_unstemmed | A study on fast calling variants from next-generation sequencing data using decision tree |
title_short | A study on fast calling variants from next-generation sequencing data using decision tree |
title_sort | study on fast calling variants from next generation sequencing data using decision tree |
topic | Next-generation sequencing Variant calling Decision tree |
url | http://link.springer.com/article/10.1186/s12859-018-2147-9 |
work_keys_str_mv | AT zhentangli astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT yiwang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT feiwang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT zhentangli studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT yiwang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT feiwang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree |