Unbiased population heterozygosity estimates from genome‐wide sequence data

Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at si...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar: Thomas L. Schmidt, Moshe‐Elijah Jasper, Andrew R Weeks, Ary A Hoffmann
Materyal Türü: Makale
Dil:English
Baskı/Yayın Bilgisi: Wiley 2021-10-01
Seri Bilgileri:Methods in Ecology and Evolution
Konular:
Online Erişim:https://doi.org/10.1111/2041-210X.13659
_version_ 1827324589291601920
author Thomas L. Schmidt
Moshe‐Elijah Jasper
Andrew R Weeks
Ary A Hoffmann
author_facet Thomas L. Schmidt
Moshe‐Elijah Jasper
Andrew R Weeks
Ary A Hoffmann
author_sort Thomas L. Schmidt
collection DOAJ
description Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.
first_indexed 2024-04-25T02:16:23Z
format Article
id doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb32
institution Directory Open Access Journal
issn 2041-210X
language English
last_indexed 2024-04-25T02:16:23Z
publishDate 2021-10-01
publisher Wiley
record_format Article
series Methods in Ecology and Evolution
spelling doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb322024-03-07T08:56:54ZengWileyMethods in Ecology and Evolution2041-210X2021-10-0112101888189810.1111/2041-210X.13659Unbiased population heterozygosity estimates from genome‐wide sequence dataThomas L. Schmidt0Moshe‐Elijah Jasper1Andrew R Weeks2Ary A Hoffmann3School of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaAbstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.https://doi.org/10.1111/2041-210X.13659conservationDArTseqfilteringgenetic mixingheterozygositypopulation structure
spellingShingle Thomas L. Schmidt
Moshe‐Elijah Jasper
Andrew R Weeks
Ary A Hoffmann
Unbiased population heterozygosity estimates from genome‐wide sequence data
Methods in Ecology and Evolution
conservation
DArTseq
filtering
genetic mixing
heterozygosity
population structure
title Unbiased population heterozygosity estimates from genome‐wide sequence data
title_full Unbiased population heterozygosity estimates from genome‐wide sequence data
title_fullStr Unbiased population heterozygosity estimates from genome‐wide sequence data
title_full_unstemmed Unbiased population heterozygosity estimates from genome‐wide sequence data
title_short Unbiased population heterozygosity estimates from genome‐wide sequence data
title_sort unbiased population heterozygosity estimates from genome wide sequence data
topic conservation
DArTseq
filtering
genetic mixing
heterozygosity
population structure
url https://doi.org/10.1111/2041-210X.13659
work_keys_str_mv AT thomaslschmidt unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata
AT mosheelijahjasper unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata
AT andrewrweeks unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata
AT aryahoffmann unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata