Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition
Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | deu |
Published: |
BMC
2017-12-01
|
Series: | Genetics Selection Evolution |
Online Access: | http://link.springer.com/article/10.1186/s12711-017-0369-3 |
_version_ | 1819060043483447296 |
---|---|
author | Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård |
author_facet | Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård |
author_sort | Theo H. E. Meuwissen |
collection | DOAJ |
description | Abstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 − $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP. |
first_indexed | 2024-12-21T14:20:43Z |
format | Article |
id | doaj.art-f70c196068544bb7a7fac66a32147b8e |
institution | Directory Open Access Journal |
issn | 1297-9686 |
language | deu |
last_indexed | 2024-12-21T14:20:43Z |
publishDate | 2017-12-01 |
publisher | BMC |
record_format | Article |
series | Genetics Selection Evolution |
spelling | doaj.art-f70c196068544bb7a7fac66a32147b8e2022-12-21T19:00:48ZdeuBMCGenetics Selection Evolution1297-96862017-12-014911910.1186/s12711-017-0369-3Variable selection models for genomic selection using whole-genome sequence data and singular value decompositionTheo H. E. Meuwissen0Ulf G. Indahl1Jørgen Ødegård2Norwegian University of Life SciencesNorwegian University of Life SciencesNorwegian University of Life SciencesAbstract Background Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. Methods The BayesC model assumes a priori that markers have normally distributed effects with probability $$ \uppi $$ π and no effect with probability (1 − $$ \uppi $$ π ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. Results SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Conclusions Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.http://link.springer.com/article/10.1186/s12711-017-0369-3 |
spellingShingle | Theo H. E. Meuwissen Ulf G. Indahl Jørgen Ødegård Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition Genetics Selection Evolution |
title | Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition |
title_full | Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition |
title_fullStr | Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition |
title_full_unstemmed | Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition |
title_short | Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition |
title_sort | variable selection models for genomic selection using whole genome sequence data and singular value decomposition |
url | http://link.springer.com/article/10.1186/s12711-017-0369-3 |
work_keys_str_mv | AT theohemeuwissen variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition AT ulfgindahl variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition AT jørgenødegard variableselectionmodelsforgenomicselectionusingwholegenomesequencedataandsingularvaluedecomposition |