Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition

Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of th...

Full description

Bibliographic Details
Main Authors:	Theo H. E. Meuwissen, Ulf G. Indahl, Jørgen Ødegård
Format:	Article
Language:	deu
Published:	BMC 2017-12-01
Series:	Genetics Selection Evolution
Online Access:	http://link.springer.com/article/10.1186/s12711-017-0369-3

_version_	1819060043483447296
author	Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård
author_facet	Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård
author_sort	Theo H. E. Meuwissen
collection	DOAJ
description	Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 − $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
first_indexed	2024-12-21T14:20:43Z
format	Article
id	doaj.art-f70c196068544bb7a7fac66a32147b8e
institution	Directory Open Access Journal
issn	1297-9686
language	deu
last_indexed	2024-12-21T14:20:43Z
publishDate	2017-12-01
publisher	BMC
record_format	Article
series	Genetics Selection Evolution
spelling	doaj.art-f70c196068544bb7a7fac66a32147b8e2022-12-21T19:00:48ZdeuBMCGenetics Selection Evolution1297-96862017-12-014911910.1186/s12711-017-0369-3Variable selection models for genomic selection using whole-genome sequence data and singular value decompositionTheo H. E. Meuwissen0Ulf G. Indahl1Jørgen Ødegård2Norwegian University of Life SciencesNorwegian University of Life SciencesNorwegian University of Life SciencesAbstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 − $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.http://link.springer.com/article/10.1186/s12711-017-0369-3
spellingShingle	Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition Genetics Selection Evolution
title	Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_full	Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_fullStr	Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_full_unstemmed	Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_short	Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_sort	variable selection models for genomic selection using whole genome sequence data and singular value decomposition
url	http://link.springer.com/article/10.1186/s12711-017-0369-3
work_keys_str_mv	AT theohemeuwissen variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition AT ulfgindahl variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition AT jørgenødegard variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition

Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition

Similar Items