Performance analysis of conventional and AI-based variant callers using short and long reads

Abstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to us...

Full description

Bibliographic Details
Main Authors: Omar Abdelwahab, François Belzile, Davoud Torkamaneh
Format: Article
Language:English
Published: BMC 2023-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05596-3
_version_ 1797388077909409792
author Omar Abdelwahab
François Belzile
Davoud Torkamaneh
author_facet Omar Abdelwahab
François Belzile
Davoud Torkamaneh
author_sort Omar Abdelwahab
collection DOAJ
description Abstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. Results In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. Conclusion This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
first_indexed 2024-03-08T22:34:31Z
format Article
id doaj.art-a37b96b7a61545c8a3eb55bfdf6f8930
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-08T22:34:31Z
publishDate 2023-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-a37b96b7a61545c8a3eb55bfdf6f89302023-12-17T12:31:55ZengBMCBMC Bioinformatics1471-21052023-12-0124111310.1186/s12859-023-05596-3Performance analysis of conventional and AI-based variant callers using short and long readsOmar Abdelwahab0François Belzile1Davoud Torkamaneh2Département de Phytologie, Université LavalDépartement de Phytologie, Université LavalDépartement de Phytologie, Université LavalAbstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. Results In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. Conclusion This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.https://doi.org/10.1186/s12859-023-05596-3GenomicsSequencingVariant callingNGSArtificial intelligence
spellingShingle Omar Abdelwahab
François Belzile
Davoud Torkamaneh
Performance analysis of conventional and AI-based variant callers using short and long reads
BMC Bioinformatics
Genomics
Sequencing
Variant calling
NGS
Artificial intelligence
title Performance analysis of conventional and AI-based variant callers using short and long reads
title_full Performance analysis of conventional and AI-based variant callers using short and long reads
title_fullStr Performance analysis of conventional and AI-based variant callers using short and long reads
title_full_unstemmed Performance analysis of conventional and AI-based variant callers using short and long reads
title_short Performance analysis of conventional and AI-based variant callers using short and long reads
title_sort performance analysis of conventional and ai based variant callers using short and long reads
topic Genomics
Sequencing
Variant calling
NGS
Artificial intelligence
url https://doi.org/10.1186/s12859-023-05596-3
work_keys_str_mv AT omarabdelwahab performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads
AT francoisbelzile performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads
AT davoudtorkamaneh performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads