Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA

Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate...

Full description

Bibliographic Details
Main Authors: Theresa Lüth, Susen Schaake, Anne Grünewald, Patrick May, Joanne Trinh, Hansi Weissensteiner
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-05-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.887644/full
_version_ 1811233400006115328
author Theresa Lüth
Susen Schaake
Anne Grünewald
Anne Grünewald
Patrick May
Joanne Trinh
Hansi Weissensteiner
author_facet Theresa Lüth
Susen Schaake
Anne Grünewald
Anne Grünewald
Patrick May
Joanne Trinh
Hansi Weissensteiner
author_sort Theresa Lüth
collection DOAJ
description Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of nanopore sequencing for low-frequency single nucleotide variant detection.Methods: We investigated the impact of base-calling, alignment/mapping, quality control steps, and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2%, and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr), and three variant callers (i.e., Mutserve2, Freebayes, and Nanopanel2) to compare low-frequency variants. We used F1 score measurements to assess the performance of variant calling.Results: We observed a mean read length of 11 kb and a mean overall read quality of 15. Ngmlr showed not only higher F1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F1 score = 0.83; false-positive allele frequencies < 0.17) compared to Minimap2 (mean F1 score = 0.82; false-positive AF < 0.06). Mutserve2 had the highest F1 scores (5% level: F1 score >0.99, 2% level: F1 score >0.54, and 1% level: F1 score >0.70) across all callers and mixture levels.Conclusion: We here present the benchmarking for low-frequency variant calling with nanopore sequencing by identifying current limitations.
first_indexed 2024-04-12T11:19:49Z
format Article
id doaj.art-1c58141749544f659ace6d26082153ab
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-12T11:19:49Z
publishDate 2022-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-1c58141749544f659ace6d26082153ab2022-12-22T03:35:24ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-05-011310.3389/fgene.2022.887644887644Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNATheresa Lüth0Susen Schaake1Anne Grünewald2Anne Grünewald3Patrick May4Joanne Trinh5Hansi Weissensteiner6Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, GermanyInstitute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, GermanyInstitute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, GermanyLuxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, LuxembourgLuxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, LuxembourgInstitute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, GermanyInstitute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, AustriaBackground: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of nanopore sequencing for low-frequency single nucleotide variant detection.Methods: We investigated the impact of base-calling, alignment/mapping, quality control steps, and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2%, and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr), and three variant callers (i.e., Mutserve2, Freebayes, and Nanopanel2) to compare low-frequency variants. We used F1 score measurements to assess the performance of variant calling.Results: We observed a mean read length of 11 kb and a mean overall read quality of 15. Ngmlr showed not only higher F1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F1 score = 0.83; false-positive allele frequencies < 0.17) compared to Minimap2 (mean F1 score = 0.82; false-positive AF < 0.06). Mutserve2 had the highest F1 scores (5% level: F1 score >0.99, 2% level: F1 score >0.54, and 1% level: F1 score >0.70) across all callers and mixture levels.Conclusion: We here present the benchmarking for low-frequency variant calling with nanopore sequencing by identifying current limitations.https://www.frontiersin.org/articles/10.3389/fgene.2022.887644/fullnanopore sequencinglong-readmtDNAheteroplasmybenchmarkingmixtures
spellingShingle Theresa Lüth
Susen Schaake
Anne Grünewald
Anne Grünewald
Patrick May
Joanne Trinh
Hansi Weissensteiner
Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
Frontiers in Genetics
nanopore sequencing
long-read
mtDNA
heteroplasmy
benchmarking
mixtures
title Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
title_full Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
title_fullStr Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
title_full_unstemmed Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
title_short Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
title_sort benchmarking low frequency variant calling with long read data on mitochondrial dna
topic nanopore sequencing
long-read
mtDNA
heteroplasmy
benchmarking
mixtures
url https://www.frontiersin.org/articles/10.3389/fgene.2022.887644/full
work_keys_str_mv AT theresaluth benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT susenschaake benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT annegrunewald benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT annegrunewald benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT patrickmay benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT joannetrinh benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna
AT hansiweissensteiner benchmarkinglowfrequencyvariantcallingwithlongreaddataonmitochondrialdna