Large-scale genomic prediction using singular value decomposition of the genotype matrix

Abstract Background For marker effect models and genomic animal models, computational requirements increase with the number of loci and the number of genotyped individuals, respectively. In the latter case, the inverse genomic relationship matrix (GRM) is typically needed, which is computationally d...

Full description

Bibliographic Details
Main Authors:	Jørgen Ødegård, Ulf Indahl, Ismo Strandén, Theo H. E. Meuwissen
Format:	Article
Language:	deu
Published:	BMC 2018-02-01
Series:	Genetics Selection Evolution
Online Access:	http://link.springer.com/article/10.1186/s12711-018-0373-2

_version_	1818806823078068224
author	Jørgen Ødegård Ulf Indahl Ismo Strandén Theo H. E. Meuwissen
author_facet	Jørgen Ødegård Ulf Indahl Ismo Strandén Theo H. E. Meuwissen
author_sort	Jørgen Ødegård
collection	DOAJ
description	Abstract Background For marker effect models and genomic animal models, computational requirements increase with the number of loci and the number of genotyped individuals, respectively. In the latter case, the inverse genomic relationship matrix (GRM) is typically needed, which is computationally demanding to compute for large datasets. Thus, there is a great need for dimensionality-reduction methods that can analyze massive genomic data. For this purpose, we developed reduced-dimension singular value decomposition (SVD) based models for genomic prediction. Methods Fast SVD is performed by analyzing different chromosomes/genome segments in parallel and/or by restricting SVD to a limited core of genotyped individuals, producing chromosome- or segment-specific principal components (PC). Given a limited effective population size, nearly all the genetic variation can be effectively captured by a limited number of PC. Genomic prediction can then be performed either by PC ridge regression (PCRR) or by genomic animal models using an inverse GRM computed from the chosen PC (PCIG). In the latter case, computation of the inverse GRM will be feasible for any number of genotyped individuals and can be readily produced row- or element-wise. Results Using simulated data, we show that PCRR and PCIG models, using chromosome-wise SVD of a core sample of individuals, are appropriate for genomic prediction in a larger population, and results in virtually identical predicted breeding values as the original full-dimension genomic model (r = 1.000). Compared with other algorithms (e.g. algorithm for proven and young animals, APY), the (chromosome-wise SVD-based) PCRR and PCIG models were more robust to size of the core sample, giving nearly identical results even down to 500 core individuals. The method was also successfully tested on a large multi-breed dataset. Conclusions SVD can be used for dimensionality reduction of large genomic datasets. After SVD, genomic prediction using dense genomic data and many genotyped individuals can be done in a computationally efficient manner. Using this method, the resulting genomic estimated breeding values were virtually identical to those computed from a full-dimension genomic model.
first_indexed	2024-12-18T19:15:53Z
format	Article
id	doaj.art-83d72d2b2be64685ae503b8c011bbeb5
institution	Directory Open Access Journal
issn	1297-9686
language	deu
last_indexed	2024-12-18T19:15:53Z
publishDate	2018-02-01
publisher	BMC
record_format	Article
series	Genetics Selection Evolution
spelling	doaj.art-83d72d2b2be64685ae503b8c011bbeb52022-12-21T20:56:08ZdeuBMCGenetics Selection Evolution1297-96862018-02-0150111210.1186/s12711-018-0373-2Large-scale genomic prediction using singular value decomposition of the genotype matrixJørgen Ødegård0Ulf Indahl1Ismo Strandén2Theo H. E. Meuwissen3AquaGen ASNorwegian University of Life SciencesNatural Resources Institute Finland (Luke)Norwegian University of Life SciencesAbstract Background For marker effect models and genomic animal models, computational requirements increase with the number of loci and the number of genotyped individuals, respectively. In the latter case, the inverse genomic relationship matrix (GRM) is typically needed, which is computationally demanding to compute for large datasets. Thus, there is a great need for dimensionality-reduction methods that can analyze massive genomic data. For this purpose, we developed reduced-dimension singular value decomposition (SVD) based models for genomic prediction. Methods Fast SVD is performed by analyzing different chromosomes/genome segments in parallel and/or by restricting SVD to a limited core of genotyped individuals, producing chromosome- or segment-specific principal components (PC). Given a limited effective population size, nearly all the genetic variation can be effectively captured by a limited number of PC. Genomic prediction can then be performed either by PC ridge regression (PCRR) or by genomic animal models using an inverse GRM computed from the chosen PC (PCIG). In the latter case, computation of the inverse GRM will be feasible for any number of genotyped individuals and can be readily produced row- or element-wise. Results Using simulated data, we show that PCRR and PCIG models, using chromosome-wise SVD of a core sample of individuals, are appropriate for genomic prediction in a larger population, and results in virtually identical predicted breeding values as the original full-dimension genomic model (r = 1.000). Compared with other algorithms (e.g. algorithm for proven and young animals, APY), the (chromosome-wise SVD-based) PCRR and PCIG models were more robust to size of the core sample, giving nearly identical results even down to 500 core individuals. The method was also successfully tested on a large multi-breed dataset. Conclusions SVD can be used for dimensionality reduction of large genomic datasets. After SVD, genomic prediction using dense genomic data and many genotyped individuals can be done in a computationally efficient manner. Using this method, the resulting genomic estimated breeding values were virtually identical to those computed from a full-dimension genomic model.http://link.springer.com/article/10.1186/s12711-018-0373-2
spellingShingle	Jørgen Ødegård Ulf Indahl Ismo Strandén Theo H. E. Meuwissen Large-scale genomic prediction using singular value decomposition of the genotype matrix Genetics Selection Evolution
title	Large-scale genomic prediction using singular value decomposition of the genotype matrix
title_full	Large-scale genomic prediction using singular value decomposition of the genotype matrix
title_fullStr	Large-scale genomic prediction using singular value decomposition of the genotype matrix
title_full_unstemmed	Large-scale genomic prediction using singular value decomposition of the genotype matrix
title_short	Large-scale genomic prediction using singular value decomposition of the genotype matrix
title_sort	large scale genomic prediction using singular value decomposition of the genotype matrix
url	http://link.springer.com/article/10.1186/s12711-018-0373-2
work_keys_str_mv	AT jørgenødegard largescalegenomicpredictionusingsingularvaluedecompositionofthegenotypematrix AT ulfindahl largescalegenomicpredictionusingsingularvaluedecompositionofthegenotypematrix AT ismostranden largescalegenomicpredictionusingsingularvaluedecompositionofthegenotypematrix AT theohemeuwissen largescalegenomicpredictionusingsingularvaluedecompositionofthegenotypematrix

Large-scale genomic prediction using singular value decomposition of the genotype matrix

Similar Items