Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true vari...

Full description

Bibliographic Details
Main Authors: Xudong Xiang, Bowen Lu, Dongyang Song, Jie Li, Kunxian Shu, Dan Pu
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-47135-3
_version_ 1797452557482721280
author Xudong Xiang
Bowen Lu
Dongyang Song
Jie Li
Kunxian Shu
Dan Pu
author_facet Xudong Xiang
Bowen Lu
Dongyang Song
Jie Li
Kunxian Shu
Dan Pu
author_sort Xudong Xiang
collection DOAJ
description Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
first_indexed 2024-03-09T15:10:24Z
format Article
id doaj.art-0a2aa88857b34f2c875f722d99bba992
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-09T15:10:24Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-0a2aa88857b34f2c875f722d99bba9922023-11-26T13:23:09ZengNature PortfolioScientific Reports2045-23222023-11-0113111410.1038/s41598-023-47135-3Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing dataXudong Xiang0Bowen Lu1Dongyang Song2Jie Li3Kunxian Shu4Dan Pu5Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsAbstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.https://doi.org/10.1038/s41598-023-47135-3
spellingShingle Xudong Xiang
Bowen Lu
Dongyang Song
Jie Li
Kunxian Shu
Dan Pu
Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
Scientific Reports
title Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_fullStr Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full_unstemmed Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_short Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_sort evaluating the performance of low frequency variant calling tools for the detection of variants from short read deep sequencing data
url https://doi.org/10.1038/s41598-023-47135-3
work_keys_str_mv AT xudongxiang evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT bowenlu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT dongyangsong evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT jieli evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT kunxianshu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT danpu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata