Performance analysis of conventional and AI-based variant callers using short and long reads

Abstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to us...

Full description

Bibliographic Details
Main Authors:	Omar Abdelwahab, François Belzile, Davoud Torkamaneh
Format:	Article
Language:	English
Published:	BMC 2023-12-01
Series:	BMC Bioinformatics
Subjects:	Genomics Sequencing Variant calling NGS Artificial intelligence
Online Access:	https://doi.org/10.1186/s12859-023-05596-3

_version_	1797388077909409792
author	Omar Abdelwahab François Belzile Davoud Torkamaneh
author_facet	Omar Abdelwahab François Belzile Davoud Torkamaneh
author_sort	Omar Abdelwahab
collection	DOAJ
description	Abstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. Results In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. Conclusion This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
first_indexed	2024-03-08T22:34:31Z
format	Article
id	doaj.art-a37b96b7a61545c8a3eb55bfdf6f8930
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-03-08T22:34:31Z
publishDate	2023-12-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-a37b96b7a61545c8a3eb55bfdf6f89302023-12-17T12:31:55ZengBMCBMC Bioinformatics1471-21052023-12-0124111310.1186/s12859-023-05596-3Performance analysis of conventional and AI-based variant callers using short and long readsOmar Abdelwahab0François Belzile1Davoud Torkamaneh2Département de Phytologie, Université LavalDépartement de Phytologie, Université LavalDépartement de Phytologie, Université LavalAbstract Background The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. Results In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. Conclusion This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.https://doi.org/10.1186/s12859-023-05596-3GenomicsSequencingVariant callingNGSArtificial intelligence
spellingShingle	Omar Abdelwahab François Belzile Davoud Torkamaneh Performance analysis of conventional and AI-based variant callers using short and long reads BMC Bioinformatics Genomics Sequencing Variant calling NGS Artificial intelligence
title	Performance analysis of conventional and AI-based variant callers using short and long reads
title_full	Performance analysis of conventional and AI-based variant callers using short and long reads
title_fullStr	Performance analysis of conventional and AI-based variant callers using short and long reads
title_full_unstemmed	Performance analysis of conventional and AI-based variant callers using short and long reads
title_short	Performance analysis of conventional and AI-based variant callers using short and long reads
title_sort	performance analysis of conventional and ai based variant callers using short and long reads
topic	Genomics Sequencing Variant calling NGS Artificial intelligence
url	https://doi.org/10.1186/s12859-023-05596-3
work_keys_str_mv	AT omarabdelwahab performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads AT francoisbelzile performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads AT davoudtorkamaneh performanceanalysisofconventionalandaibasedvariantcallersusingshortandlongreads

Performance analysis of conventional and AI-based variant callers using short and long reads

Similar Items