Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population

Abstract Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approac...

Full description

Bibliographic Details
Main Authors: Shaopan Ye, Xiaolong Yuan, Xiran Lin, Ning Gao, Yuanyu Luo, Zanmou Chen, Jiaqi Li, Xiquan Zhang, Zhe Zhang
Format: Article
Language:English
Published: BMC 2018-03-01
Series:Journal of Animal Science and Biotechnology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40104-018-0241-5
_version_ 1818929106007359488
author Shaopan Ye
Xiaolong Yuan
Xiran Lin
Ning Gao
Yuanyu Luo
Zanmou Chen
Jiaqi Li
Xiquan Zhang
Zhe Zhang
author_facet Shaopan Ye
Xiaolong Yuan
Xiran Lin
Ning Gao
Yuanyu Luo
Zanmou Chen
Jiaqi Li
Xiquan Zhang
Zhe Zhang
author_sort Shaopan Ye
collection DOAJ
description Abstract Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. Results We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. Conclusions In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations.
first_indexed 2024-12-20T03:39:31Z
format Article
id doaj.art-cc6ca961553c4c91bc6c36ef71da031d
institution Directory Open Access Journal
issn 2049-1891
language English
last_indexed 2024-12-20T03:39:31Z
publishDate 2018-03-01
publisher BMC
record_format Article
series Journal of Animal Science and Biotechnology
spelling doaj.art-cc6ca961553c4c91bc6c36ef71da031d2022-12-21T19:54:46ZengBMCJournal of Animal Science and Biotechnology2049-18912018-03-019111210.1186/s40104-018-0241-5Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken populationShaopan Ye0Xiaolong Yuan1Xiran Lin2Ning Gao3Yuanyu Luo4Zanmou Chen5Jiaqi Li6Xiquan Zhang7Zhe Zhang8Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityGuangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural UniversityAbstract Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. Results We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. Conclusions In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations.http://link.springer.com/article/10.1186/s40104-018-0241-5ChickensImputationRe-sequencingSNP
spellingShingle Shaopan Ye
Xiaolong Yuan
Xiran Lin
Ning Gao
Yuanyu Luo
Zanmou Chen
Jiaqi Li
Xiquan Zhang
Zhe Zhang
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
Journal of Animal Science and Biotechnology
Chickens
Imputation
Re-sequencing
SNP
title Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_full Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_fullStr Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_full_unstemmed Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_short Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_sort imputation from snp chip to sequence a case study in a chinese indigenous chicken population
topic Chickens
Imputation
Re-sequencing
SNP
url http://link.springer.com/article/10.1186/s40104-018-0241-5
work_keys_str_mv AT shaopanye imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT xiaolongyuan imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT xiranlin imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT ninggao imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT yuanyuluo imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT zanmouchen imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT jiaqili imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT xiquanzhang imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT zhezhang imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation