Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition

Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of th...

Full description

Bibliographic Details
Main Authors: Theo H. E. Meuwissen, Ulf G. Indahl, Jørgen Ødegård
Format: Article
Language:deu
Published: BMC 2017-12-01
Series:Genetics Selection Evolution
Online Access:http://link.springer.com/article/10.1186/s12711-017-0369-3
_version_ 1819060043483447296
author Theo H. E. Meuwissen
Ulf G. Indahl
Jørgen Ødegård
author_facet Theo H. E. Meuwissen
Ulf G. Indahl
Jørgen Ødegård
author_sort Theo H. E. Meuwissen
collection DOAJ
description Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 −  $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
first_indexed 2024-12-21T14:20:43Z
format Article
id doaj.art-f70c196068544bb7a7fac66a32147b8e
institution Directory Open Access Journal
issn 1297-9686
language deu
last_indexed 2024-12-21T14:20:43Z
publishDate 2017-12-01
publisher BMC
record_format Article
series Genetics Selection Evolution
spelling doaj.art-f70c196068544bb7a7fac66a32147b8e2022-12-21T19:00:48ZdeuBMCGenetics Selection Evolution1297-96862017-12-014911910.1186/s12711-017-0369-3Variable selection models for genomic selection using whole-genome sequence data and singular value decompositionTheo H. E. Meuwissen0Ulf G. Indahl1Jørgen Ødegård2Norwegian University of Life SciencesNorwegian University of Life SciencesNorwegian University of Life SciencesAbstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 −  $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.http://link.springer.com/article/10.1186/s12711-017-0369-3
spellingShingle Theo H. E. Meuwissen
Ulf G. Indahl
Jørgen Ødegård
Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
Genetics Selection Evolution
title Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_full Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_fullStr Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_full_unstemmed Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_short Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
title_sort variable selection models for genomic selection using whole genome sequence data and singular value decomposition
url http://link.springer.com/article/10.1186/s12711-017-0369-3
work_keys_str_mv AT theohemeuwissen variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition
AT ulfgindahl variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition
AT jørgenødegard variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition