How array design creates SNP ascertainment bias.

Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be...

Full description

Bibliographic Details
Main Authors: Johannes Geibel, Christian Reimer, Steffen Weigend, Annett Weigend, Torsten Pook, Henner Simianer
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://publications.goettingen-research-online.de/bitstream/2/85209/1/journal.pone.0245178.pdf
_version_ 1826554406768738304
author Johannes Geibel
Christian Reimer
Steffen Weigend
Annett Weigend
Torsten Pook
Henner Simianer
author_facet Johannes Geibel
Christian Reimer
Steffen Weigend
Annett Weigend
Torsten Pook
Henner Simianer
author_sort Johannes Geibel
collection DOAJ
description Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.
first_indexed 2024-12-19T00:08:16Z
format Article
id doaj.art-17d50d8db5854ef2bf97017ea74bdb81
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2025-03-14T07:40:25Z
publishDate 2021-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-17d50d8db5854ef2bf97017ea74bdb812025-03-03T05:35:37ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01163e024517810.1371/journal.pone.0245178How array design creates SNP ascertainment bias.Johannes GeibelChristian ReimerSteffen WeigendAnnett WeigendTorsten PookHenner SimianerSingle nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.https://publications.goettingen-research-online.de/bitstream/2/85209/1/journal.pone.0245178.pdf
spellingShingle Johannes Geibel
Christian Reimer
Steffen Weigend
Annett Weigend
Torsten Pook
Henner Simianer
How array design creates SNP ascertainment bias.
PLoS ONE
title How array design creates SNP ascertainment bias.
title_full How array design creates SNP ascertainment bias.
title_fullStr How array design creates SNP ascertainment bias.
title_full_unstemmed How array design creates SNP ascertainment bias.
title_short How array design creates SNP ascertainment bias.
title_sort how array design creates snp ascertainment bias
url https://publications.goettingen-research-online.de/bitstream/2/85209/1/journal.pone.0245178.pdf
work_keys_str_mv AT johannesgeibel howarraydesigncreatessnpascertainmentbias
AT christianreimer howarraydesigncreatessnpascertainmentbias
AT steffenweigend howarraydesigncreatessnpascertainmentbias
AT annettweigend howarraydesigncreatessnpascertainmentbias
AT torstenpook howarraydesigncreatessnpascertainmentbias
AT hennersimianer howarraydesigncreatessnpascertainmentbias