Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true vari...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-11-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-47135-3 |
_version_ | 1797452557482721280 |
---|---|
author | Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu |
author_facet | Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu |
author_sort | Xudong Xiang |
collection | DOAJ |
description | Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications. |
first_indexed | 2024-03-09T15:10:24Z |
format | Article |
id | doaj.art-0a2aa88857b34f2c875f722d99bba992 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-09T15:10:24Z |
publishDate | 2023-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-0a2aa88857b34f2c875f722d99bba9922023-11-26T13:23:09ZengNature PortfolioScientific Reports2045-23222023-11-0113111410.1038/s41598-023-47135-3Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing dataXudong Xiang0Bowen Lu1Dongyang Song2Jie Li3Kunxian Shu4Dan Pu5Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsChongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and TelecommunicationsAbstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.https://doi.org/10.1038/s41598-023-47135-3 |
spellingShingle | Xudong Xiang Bowen Lu Dongyang Song Jie Li Kunxian Shu Dan Pu Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data Scientific Reports |
title | Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data |
title_full | Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data |
title_fullStr | Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data |
title_full_unstemmed | Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data |
title_short | Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data |
title_sort | evaluating the performance of low frequency variant calling tools for the detection of variants from short read deep sequencing data |
url | https://doi.org/10.1038/s41598-023-47135-3 |
work_keys_str_mv | AT xudongxiang evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT bowenlu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT dongyangsong evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT jieli evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT kunxianshu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT danpu evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata |