Unbiased population heterozygosity estimates from genome‐wide sequence data

Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at si...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar:	Thomas L. Schmidt, Moshe‐Elijah Jasper, Andrew R Weeks, Ary A Hoffmann
Materyal Türü:	Makale
Dil:	English
Baskı/Yayın Bilgisi:	Wiley 2021-10-01
Seri Bilgileri:	Methods in Ecology and Evolution
Konular:	conservation DArTseq filtering genetic mixing heterozygosity population structure
Online Erişim:	https://doi.org/10.1111/2041-210X.13659

_version_	1827324589291601920
author	Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann
author_facet	Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann
author_sort	Thomas L. Schmidt
collection	DOAJ
description	Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.
first_indexed	2024-04-25T02:16:23Z
format	Article
id	doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb32
institution	Directory Open Access Journal
issn	2041-210X
language	English
last_indexed	2024-04-25T02:16:23Z
publishDate	2021-10-01
publisher	Wiley
record_format	Article
series	Methods in Ecology and Evolution
spelling	doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb322024-03-07T08:56:54ZengWileyMethods in Ecology and Evolution2041-210X2021-10-0112101888189810.1111/2041-210X.13659Unbiased population heterozygosity estimates from genome‐wide sequence dataThomas L. Schmidt0Moshe‐Elijah Jasper1Andrew R Weeks2Ary A Hoffmann3School of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaAbstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.https://doi.org/10.1111/2041-210X.13659conservationDArTseqfilteringgenetic mixingheterozygositypopulation structure
spellingShingle	Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann Unbiased population heterozygosity estimates from genome‐wide sequence data Methods in Ecology and Evolution conservation DArTseq filtering genetic mixing heterozygosity population structure
title	Unbiased population heterozygosity estimates from genome‐wide sequence data
title_full	Unbiased population heterozygosity estimates from genome‐wide sequence data
title_fullStr	Unbiased population heterozygosity estimates from genome‐wide sequence data
title_full_unstemmed	Unbiased population heterozygosity estimates from genome‐wide sequence data
title_short	Unbiased population heterozygosity estimates from genome‐wide sequence data
title_sort	unbiased population heterozygosity estimates from genome wide sequence data
topic	conservation DArTseq filtering genetic mixing heterozygosity population structure
url	https://doi.org/10.1111/2041-210X.13659
work_keys_str_mv	AT thomaslschmidt unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT mosheelijahjasper unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT andrewrweeks unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT aryahoffmann unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata

Unbiased population heterozygosity estimates from genome‐wide sequence data

Benzer Materyaller