vcfdist: accurately benchmarking phased small variant calls in human genomes

Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance...

Full description

Bibliographic Details
Main Authors: Tim Dunn, Satish Narayanasamy
Format: Article
Language:English
Published: Nature Portfolio 2023-12-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-43876-x
_version_ 1827590428413657088
author Tim Dunn
Satish Narayanasamy
author_facet Tim Dunn
Satish Narayanasamy
author_sort Tim Dunn
collection DOAJ
description Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R 2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.
first_indexed 2024-03-09T01:17:45Z
format Article
id doaj.art-940207f68f9a42c589fd21218b34f084
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-09T01:17:45Z
publishDate 2023-12-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-940207f68f9a42c589fd21218b34f0842023-12-10T12:24:20ZengNature PortfolioNature Communications2041-17232023-12-0114111210.1038/s41467-023-43876-xvcfdist: accurately benchmarking phased small variant calls in human genomesTim Dunn0Satish Narayanasamy1Computer Science and Engineering, University of MichiganComputer Science and Engineering, University of MichiganAbstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R 2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.https://doi.org/10.1038/s41467-023-43876-x
spellingShingle Tim Dunn
Satish Narayanasamy
vcfdist: accurately benchmarking phased small variant calls in human genomes
Nature Communications
title vcfdist: accurately benchmarking phased small variant calls in human genomes
title_full vcfdist: accurately benchmarking phased small variant calls in human genomes
title_fullStr vcfdist: accurately benchmarking phased small variant calls in human genomes
title_full_unstemmed vcfdist: accurately benchmarking phased small variant calls in human genomes
title_short vcfdist: accurately benchmarking phased small variant calls in human genomes
title_sort vcfdist accurately benchmarking phased small variant calls in human genomes
url https://doi.org/10.1038/s41467-023-43876-x
work_keys_str_mv AT timdunn vcfdistaccuratelybenchmarkingphasedsmallvariantcallsinhumangenomes
AT satishnarayanasamy vcfdistaccuratelybenchmarkingphasedsmallvariantcallsinhumangenomes