Impact of reference population and marker density on accuracy of population imputation

The effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was determined. The population imputation method using the FImpute software was applied. The dataset used for the purpose of this study was taken from the database of t...

Full description

Bibliographic Details
Main Authors: Anita Kranjčevičová, Eva Kašná, Michaela Brzáková, Josef Přibyl, Luboš Vostrý
Format: Article
Language:English
Published: Czech Academy of Agricultural Sciences 2019-10-01
Series:Czech Journal of Animal Science
Subjects:
Online Access:https://cjas.agriculturejournals.cz/artkey/cjs-201910-0001_impact-of-reference-population-and-marker-density-on-accuracy-of-population-imputation.php
_version_ 1828008388899897344
author Anita Kranjčevičová
Eva Kašná
Michaela Brzáková
Josef Přibyl
Luboš Vostrý
author_facet Anita Kranjčevičová
Eva Kašná
Michaela Brzáková
Josef Přibyl
Luboš Vostrý
author_sort Anita Kranjčevičová
collection DOAJ
description The effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was determined. The population imputation method using the FImpute software was applied. The dataset used for the purpose of this study was taken from the database of the Holstein Cattle Breeders Association of the Czech Republic. It contains 1000 animals genotyped with the Illumina BovineSNP50 v.2 BeadChip. Two datasets were created, the first containing the original genotypes, including the missing SNPs, the second containing the same genotypes modified to avoid missing data. In these datasets, animals were randomly selected for a reference population (10, 25, 50 and 75%) and there were randomly selected SNPs for deletion (15, 30, 55, 70, and 95%) in animals that were not used as the reference population. Subsequently, the data accuracy was determined by two parameters: correlation between original and imputed SNPs and percentage of correctly imputed SNPs. Since animals and SNPs were randomly selected, the process including data imputation was repeated 100 times. Accuracy was determined as the average accuracy over all repetitions. It was found that the imputation accuracy is influenced by both parameters. If the size of the reference population is sufficient, the imputation accuracy is higher despite the large number of missing SNPs.
first_indexed 2024-04-10T08:25:27Z
format Article
id doaj.art-c7ff44b8511b4434aaf36b3b448261ac
institution Directory Open Access Journal
issn 1212-1819
1805-9309
language English
last_indexed 2024-04-10T08:25:27Z
publishDate 2019-10-01
publisher Czech Academy of Agricultural Sciences
record_format Article
series Czech Journal of Animal Science
spelling doaj.art-c7ff44b8511b4434aaf36b3b448261ac2023-02-23T03:33:40ZengCzech Academy of Agricultural SciencesCzech Journal of Animal Science1212-18191805-93092019-10-01641040541010.17221/148/2019-CJAScjs-201910-0001Impact of reference population and marker density on accuracy of population imputationAnita Kranjčevičová0Eva Kašná1Michaela Brzáková2Josef Přibyl3Luboš Vostrý4Department of Genetics and Breeding, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Prague, Czech RepublicDepartment of Genetics and Breeding of Farm Animals, Institute of Animal Science, Prague-Uhříněves, Czech RepublicDepartment of Genetics and Breeding, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Prague, Czech RepublicDepartment of Genetics and Breeding of Farm Animals, Institute of Animal Science, Prague-Uhříněves, Czech RepublicDepartment of Genetics and Breeding, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Prague, Czech RepublicThe effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was determined. The population imputation method using the FImpute software was applied. The dataset used for the purpose of this study was taken from the database of the Holstein Cattle Breeders Association of the Czech Republic. It contains 1000 animals genotyped with the Illumina BovineSNP50 v.2 BeadChip. Two datasets were created, the first containing the original genotypes, including the missing SNPs, the second containing the same genotypes modified to avoid missing data. In these datasets, animals were randomly selected for a reference population (10, 25, 50 and 75%) and there were randomly selected SNPs for deletion (15, 30, 55, 70, and 95%) in animals that were not used as the reference population. Subsequently, the data accuracy was determined by two parameters: correlation between original and imputed SNPs and percentage of correctly imputed SNPs. Since animals and SNPs were randomly selected, the process including data imputation was repeated 100 times. Accuracy was determined as the average accuracy over all repetitions. It was found that the imputation accuracy is influenced by both parameters. If the size of the reference population is sufficient, the imputation accuracy is higher despite the large number of missing SNPs.https://cjas.agriculturejournals.cz/artkey/cjs-201910-0001_impact-of-reference-population-and-marker-density-on-accuracy-of-population-imputation.phpcattlegenomicsmarker densitymissing snpssimulation
spellingShingle Anita Kranjčevičová
Eva Kašná
Michaela Brzáková
Josef Přibyl
Luboš Vostrý
Impact of reference population and marker density on accuracy of population imputation
Czech Journal of Animal Science
cattle
genomics
marker density
missing snps
simulation
title Impact of reference population and marker density on accuracy of population imputation
title_full Impact of reference population and marker density on accuracy of population imputation
title_fullStr Impact of reference population and marker density on accuracy of population imputation
title_full_unstemmed Impact of reference population and marker density on accuracy of population imputation
title_short Impact of reference population and marker density on accuracy of population imputation
title_sort impact of reference population and marker density on accuracy of population imputation
topic cattle
genomics
marker density
missing snps
simulation
url https://cjas.agriculturejournals.cz/artkey/cjs-201910-0001_impact-of-reference-population-and-marker-density-on-accuracy-of-population-imputation.php
work_keys_str_mv AT anitakranjcevicova impactofreferencepopulationandmarkerdensityonaccuracyofpopulationimputation
AT evakasna impactofreferencepopulationandmarkerdensityonaccuracyofpopulationimputation
AT michaelabrzakova impactofreferencepopulationandmarkerdensityonaccuracyofpopulationimputation
AT josefpribyl impactofreferencepopulationandmarkerdensityonaccuracyofpopulationimputation
AT lubosvostry impactofreferencepopulationandmarkerdensityonaccuracyofpopulationimputation