Unbiased population heterozygosity estimates from genome‐wide sequence data
Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at si...
Asıl Yazarlar: | , , , |
---|---|
Materyal Türü: | Makale |
Dil: | English |
Baskı/Yayın Bilgisi: |
Wiley
2021-10-01
|
Seri Bilgileri: | Methods in Ecology and Evolution |
Konular: | |
Online Erişim: | https://doi.org/10.1111/2041-210X.13659 |
_version_ | 1827324589291601920 |
---|---|
author | Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann |
author_facet | Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann |
author_sort | Thomas L. Schmidt |
collection | DOAJ |
description | Abstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity. |
first_indexed | 2024-04-25T02:16:23Z |
format | Article |
id | doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb32 |
institution | Directory Open Access Journal |
issn | 2041-210X |
language | English |
last_indexed | 2024-04-25T02:16:23Z |
publishDate | 2021-10-01 |
publisher | Wiley |
record_format | Article |
series | Methods in Ecology and Evolution |
spelling | doaj.art-a9b51cc48f2c44a7a40b40bbeb7abb322024-03-07T08:56:54ZengWileyMethods in Ecology and Evolution2041-210X2021-10-0112101888189810.1111/2041-210X.13659Unbiased population heterozygosity estimates from genome‐wide sequence dataThomas L. Schmidt0Moshe‐Elijah Jasper1Andrew R Weeks2Ary A Hoffmann3School of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaSchool of BioSciences Bio21 InstituteUniversity of Melbourne Parkville VIC AustraliaAbstract Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome‐wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced‐representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome‐wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome‐wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.https://doi.org/10.1111/2041-210X.13659conservationDArTseqfilteringgenetic mixingheterozygositypopulation structure |
spellingShingle | Thomas L. Schmidt Moshe‐Elijah Jasper Andrew R Weeks Ary A Hoffmann Unbiased population heterozygosity estimates from genome‐wide sequence data Methods in Ecology and Evolution conservation DArTseq filtering genetic mixing heterozygosity population structure |
title | Unbiased population heterozygosity estimates from genome‐wide sequence data |
title_full | Unbiased population heterozygosity estimates from genome‐wide sequence data |
title_fullStr | Unbiased population heterozygosity estimates from genome‐wide sequence data |
title_full_unstemmed | Unbiased population heterozygosity estimates from genome‐wide sequence data |
title_short | Unbiased population heterozygosity estimates from genome‐wide sequence data |
title_sort | unbiased population heterozygosity estimates from genome wide sequence data |
topic | conservation DArTseq filtering genetic mixing heterozygosity population structure |
url | https://doi.org/10.1111/2041-210X.13659 |
work_keys_str_mv | AT thomaslschmidt unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT mosheelijahjasper unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT andrewrweeks unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata AT aryahoffmann unbiasedpopulationheterozygosityestimatesfromgenomewidesequencedata |