vcfdist: accurately benchmarking phased small variant calls in human genomes
Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-12-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-023-43876-x |
_version_ | 1827590428413657088 |
---|---|
author | Tim Dunn Satish Narayanasamy |
author_facet | Tim Dunn Satish Narayanasamy |
author_sort | Tim Dunn |
collection | DOAJ |
description | Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R 2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist. |
first_indexed | 2024-03-09T01:17:45Z |
format | Article |
id | doaj.art-940207f68f9a42c589fd21218b34f084 |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2024-03-09T01:17:45Z |
publishDate | 2023-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-940207f68f9a42c589fd21218b34f0842023-12-10T12:24:20ZengNature PortfolioNature Communications2041-17232023-12-0114111210.1038/s41467-023-43876-xvcfdist: accurately benchmarking phased small variant calls in human genomesTim Dunn0Satish Narayanasamy1Computer Science and Engineering, University of MichiganComputer Science and Engineering, University of MichiganAbstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R 2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.https://doi.org/10.1038/s41467-023-43876-x |
spellingShingle | Tim Dunn Satish Narayanasamy vcfdist: accurately benchmarking phased small variant calls in human genomes Nature Communications |
title | vcfdist: accurately benchmarking phased small variant calls in human genomes |
title_full | vcfdist: accurately benchmarking phased small variant calls in human genomes |
title_fullStr | vcfdist: accurately benchmarking phased small variant calls in human genomes |
title_full_unstemmed | vcfdist: accurately benchmarking phased small variant calls in human genomes |
title_short | vcfdist: accurately benchmarking phased small variant calls in human genomes |
title_sort | vcfdist accurately benchmarking phased small variant calls in human genomes |
url | https://doi.org/10.1038/s41467-023-43876-x |
work_keys_str_mv | AT timdunn vcfdistaccuratelybenchmarkingphasedsmallvariantcallsinhumangenomes AT satishnarayanasamy vcfdistaccuratelybenchmarkingphasedsmallvariantcallsinhumangenomes |