Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data

Abstract Background Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination o...

Full description

Bibliographic Details
Main Authors: Manojkumar Kumaran, Umadevi Subramanian, Bharanidharan Devarajan
Format: Article
Language:English
Published: BMC 2019-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2928-9
_version_ 1818228943594979328
author Manojkumar Kumaran
Umadevi Subramanian
Bharanidharan Devarajan
author_facet Manojkumar Kumaran
Umadevi Subramanian
Bharanidharan Devarajan
author_sort Manojkumar Kumaran
collection DOAJ
description Abstract Background Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination of aligner-variant caller tools for detecting accurate single nucleotide variants (SNVs) and small insertion and deletion (InDels) separately. Moreover, many important aspects of InDel detection are overlooked while comparing the performance of tools, particularly its base pair length. Results We assessed the performance of variant calling pipelines using the combinations of four variant callers and five aligners on human NA12878 and simulated exome data. We used high confidence variant calls from Genome in a Bottle (GiaB) consortium for validation, and GRCh37 and GRCh38 as the human reference genome. Based on the performance metrics, both BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for InDels. Furthermore, we obtained similar results on human NA24385 and NA24631 exome data from GiaB. Conclusion In this study, DeepVariant with BWA and Novoalign performed best for detecting accurate SNVs and InDels. The accuracy of variant calling was improved by merging the top performing pipelines. The results of our study provide useful recommendations for analysis of WES data in clinical genomics.
first_indexed 2024-12-12T10:10:44Z
format Article
id doaj.art-f594c2bd79884f0890f236e2fc58e2db
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T10:10:44Z
publishDate 2019-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-f594c2bd79884f0890f236e2fc58e2db2022-12-22T00:27:49ZengBMCBMC Bioinformatics1471-21052019-06-0120111110.1186/s12859-019-2928-9Performance assessment of variant calling pipelines using human whole exome sequencing and simulated dataManojkumar Kumaran0Umadevi Subramanian1Bharanidharan Devarajan2Department of Bioinformatics, Aravind Medical Research FoundationDepartment of Bioinformatics, Aravind Medical Research FoundationDepartment of Bioinformatics, Aravind Medical Research FoundationAbstract Background Whole exome sequencing (WES) is a cost-effective method that identifies clinical variants but it demands accurate variant caller tools. Currently available tools have variable accuracy in predicting specific clinical variants. But it may be possible to find the best combination of aligner-variant caller tools for detecting accurate single nucleotide variants (SNVs) and small insertion and deletion (InDels) separately. Moreover, many important aspects of InDel detection are overlooked while comparing the performance of tools, particularly its base pair length. Results We assessed the performance of variant calling pipelines using the combinations of four variant callers and five aligners on human NA12878 and simulated exome data. We used high confidence variant calls from Genome in a Bottle (GiaB) consortium for validation, and GRCh37 and GRCh38 as the human reference genome. Based on the performance metrics, both BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for InDels. Furthermore, we obtained similar results on human NA24385 and NA24631 exome data from GiaB. Conclusion In this study, DeepVariant with BWA and Novoalign performed best for detecting accurate SNVs and InDels. The accuracy of variant calling was improved by merging the top performing pipelines. The results of our study provide useful recommendations for analysis of WES data in clinical genomics.http://link.springer.com/article/10.1186/s12859-019-2928-9Whole exome sequencingSimulated exome dataHuman reference genomeVariant calling pipelinesSNVs and InDels
spellingShingle Manojkumar Kumaran
Umadevi Subramanian
Bharanidharan Devarajan
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
BMC Bioinformatics
Whole exome sequencing
Simulated exome data
Human reference genome
Variant calling pipelines
SNVs and InDels
title Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_full Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_fullStr Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_full_unstemmed Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_short Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
title_sort performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
topic Whole exome sequencing
Simulated exome data
Human reference genome
Variant calling pipelines
SNVs and InDels
url http://link.springer.com/article/10.1186/s12859-019-2928-9
work_keys_str_mv AT manojkumarkumaran performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata
AT umadevisubramanian performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata
AT bharanidharandevarajan performanceassessmentofvariantcallingpipelinesusinghumanwholeexomesequencingandsimulateddata