Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true vari...

Full description

Bibliographic Details
Main Authors:	Xudong Xiang, Bowen Lu, Dongyang Song, Jie Li, Kunxian Shu, Dan Pu
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-11-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-47135-3

_version_	1797452557482721280
author	Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu
author_facet	Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu
author_sort	Xudong Xiang
collection	DOAJ
description	Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
first_indexed	2024-03-09T15:10:24Z
format	Article
id	doaj.art-0a2aa88857b34f2c875f722d99bba992
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-03-09T15:10:24Z
publishDate	2023-11-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-0a2aa88857b34f2c875f722d99bba9922023-11-26T13:23:09ZengNature PortfolioScientific Reports2045-23222023-11-0113111410.1038/s41598-023-47135-3Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing dataXudong Xiang0Bowen Lu1Dongyang Song2Jie Li3Kunxian Shu4Dan Pu5Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsAbstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.https://doi.org/10.1038/s41598-023-47135-3
spellingShingle	Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data Scientific Reports
title	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_fullStr	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full_unstemmed	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_short	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_sort	evaluating the performance of low frequency variant calling tools for the detection of variants from short read deep sequencing data
url	https://doi.org/10.1038/s41598-023-47135-3
work_keys_str_mv	AT xudongxiang evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT bowenlu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT dongyangsong evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT jieli evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT kunxianshu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT danpu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Similar Items