NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele

Homology between mitochondrial DNA (mtDNA) and nuclear DNA of mitochondrial origin (nuMTs) causes confounding when aligning short sequence reads to the reference human genome, as the true sequence origin cannot be determined. Using a systematic in silico approach, we here report the impact of all po...

Full description

Bibliographic Details
Main Authors: Hannah Maude, Mira Davidson, Natalie Charitakis, Leo Diaz, William H. T. Bowers, Eva Gradovich, Toby Andrew, Derek Huntley
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-09-01
Series:Frontiers in Cell and Developmental Biology
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fcell.2019.00201/full
_version_ 1818327624367210496
author Hannah Maude
Hannah Maude
Mira Davidson
Natalie Charitakis
Leo Diaz
William H. T. Bowers
Eva Gradovich
Toby Andrew
Derek Huntley
author_facet Hannah Maude
Hannah Maude
Mira Davidson
Natalie Charitakis
Leo Diaz
William H. T. Bowers
Eva Gradovich
Toby Andrew
Derek Huntley
author_sort Hannah Maude
collection DOAJ
description Homology between mitochondrial DNA (mtDNA) and nuclear DNA of mitochondrial origin (nuMTs) causes confounding when aligning short sequence reads to the reference human genome, as the true sequence origin cannot be determined. Using a systematic in silico approach, we here report the impact of all potential mitochondrial variants on alignment accuracy and variant calling. A total of 49,707 possible mutations were introduced across the 16,569 bp reference mitochondrial genome (16,569 × 3 alternative alleles), one variant at-at-time. The resulting in silico fragmentation and alignment to the entire reference genome (GRCh38) revealed preferential mapping of mutated mitochondrial fragments to nuclear loci, as variants increased loci similarity to nuMTs, for a total of 807, 362, and 41 variants at 333, 144, and 27 positions when using 100, 150, and 300 bp single-end fragments. We subsequently modeled these affected variants at 50% heteroplasmy and carried out variant calling, observing bias in the reported allele frequencies in favor of the reference allele. Four variants (chrM:6023A, chrM:4456T, chrM:5147A, and chrM:7521A) including a possible hypertension factor, chrM:4456T, caused 100% loss of coverage at the mutated position (with all 100 bp single-end fragments aligning to homologous, nuclear positions instead of chrM), rendering these variants undetectable when aligning to the entire reference genome. Furthermore, four mitochondrial variants reported to be pathogenic were found to cause significant loss of coverage and select haplogroup-defining SNPs were shown to exacerbate the loss of coverage caused by surrounding variants. Increased fragment length and use of paired-end reads both improved alignment accuracy.
first_indexed 2024-12-13T12:19:14Z
format Article
id doaj.art-fb8839994eed4f119d8981c64a7487ce
institution Directory Open Access Journal
issn 2296-634X
language English
last_indexed 2024-12-13T12:19:14Z
publishDate 2019-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Cell and Developmental Biology
spelling doaj.art-fb8839994eed4f119d8981c64a7487ce2022-12-21T23:46:37ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2019-09-01710.3389/fcell.2019.00201467769NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference AlleleHannah Maude0Hannah Maude1Mira Davidson2Natalie Charitakis3Leo Diaz4William H. T. Bowers5Eva Gradovich6Toby Andrew7Derek Huntley8Department of Life Sciences, Imperial College London, London, United KingdomSection of Genomics of Common Disease, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomSection of Genomics of Common Disease, Imperial College London, London, United KingdomDepartment of Life Sciences, Imperial College London, London, United KingdomHomology between mitochondrial DNA (mtDNA) and nuclear DNA of mitochondrial origin (nuMTs) causes confounding when aligning short sequence reads to the reference human genome, as the true sequence origin cannot be determined. Using a systematic in silico approach, we here report the impact of all potential mitochondrial variants on alignment accuracy and variant calling. A total of 49,707 possible mutations were introduced across the 16,569 bp reference mitochondrial genome (16,569 × 3 alternative alleles), one variant at-at-time. The resulting in silico fragmentation and alignment to the entire reference genome (GRCh38) revealed preferential mapping of mutated mitochondrial fragments to nuclear loci, as variants increased loci similarity to nuMTs, for a total of 807, 362, and 41 variants at 333, 144, and 27 positions when using 100, 150, and 300 bp single-end fragments. We subsequently modeled these affected variants at 50% heteroplasmy and carried out variant calling, observing bias in the reported allele frequencies in favor of the reference allele. Four variants (chrM:6023A, chrM:4456T, chrM:5147A, and chrM:7521A) including a possible hypertension factor, chrM:4456T, caused 100% loss of coverage at the mutated position (with all 100 bp single-end fragments aligning to homologous, nuclear positions instead of chrM), rendering these variants undetectable when aligning to the entire reference genome. Furthermore, four mitochondrial variants reported to be pathogenic were found to cause significant loss of coverage and select haplogroup-defining SNPs were shown to exacerbate the loss of coverage caused by surrounding variants. Increased fragment length and use of paired-end reads both improved alignment accuracy.https://www.frontiersin.org/article/10.3389/fcell.2019.00201/fullnuMTmtDNAgenotypemitochondrial variantsmitochondrial genotypeNGS
spellingShingle Hannah Maude
Hannah Maude
Mira Davidson
Natalie Charitakis
Leo Diaz
William H. T. Bowers
Eva Gradovich
Toby Andrew
Derek Huntley
NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
Frontiers in Cell and Developmental Biology
nuMT
mtDNA
genotype
mitochondrial variants
mitochondrial genotype
NGS
title NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
title_full NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
title_fullStr NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
title_full_unstemmed NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
title_short NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele
title_sort numt confounding biases mitochondrial heteroplasmy calls in favor of the reference allele
topic nuMT
mtDNA
genotype
mitochondrial variants
mitochondrial genotype
NGS
url https://www.frontiersin.org/article/10.3389/fcell.2019.00201/full
work_keys_str_mv AT hannahmaude numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT hannahmaude numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT miradavidson numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT nataliecharitakis numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT leodiaz numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT williamhtbowers numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT evagradovich numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT tobyandrew numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele
AT derekhuntley numtconfoundingbiasesmitochondrialheteroplasmycallsinfavorofthereferenceallele