Comparing Genomic Prediction Models by Means of Cross Validation

In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those para...

Full description

Bibliographic Details
Main Authors: Matías F. Schrauf, Gustavo de los Campos, Sebastián Munilla
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-11-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/full
_version_ 1818836969697837056
author Matías F. Schrauf
Matías F. Schrauf
Gustavo de los Campos
Sebastián Munilla
Sebastián Munilla
author_facet Matías F. Schrauf
Matías F. Schrauf
Gustavo de los Campos
Sebastián Munilla
Sebastián Munilla
author_sort Matías F. Schrauf
collection DOAJ
description In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
first_indexed 2024-12-19T03:15:03Z
format Article
id doaj.art-d8bb4b08605549eeb8fc88dc3c3b7037
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-12-19T03:15:03Z
publishDate 2021-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-d8bb4b08605549eeb8fc88dc3c3b70372022-12-21T20:37:54ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2021-11-011210.3389/fpls.2021.734512734512Comparing Genomic Prediction Models by Means of Cross ValidationMatías F. Schrauf0Matías F. Schrauf1Gustavo de los Campos2Sebastián Munilla3Sebastián Munilla4Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, ArgentinaAnimal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, NetherlandsDepartments of Epidemiology, Biostatistics, Statistics, and Probabilty, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United StatesFacultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, ArgentinaInstituto de Investigaciones en Producción Animal (INPA), CONICET-Universidad de Buenos Aires, Buenos Aires, ArgentinaIn the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/fullgenomic selectioncross validationplant breedinggenomic modelsmodel selection
spellingShingle Matías F. Schrauf
Matías F. Schrauf
Gustavo de los Campos
Sebastián Munilla
Sebastián Munilla
Comparing Genomic Prediction Models by Means of Cross Validation
Frontiers in Plant Science
genomic selection
cross validation
plant breeding
genomic models
model selection
title Comparing Genomic Prediction Models by Means of Cross Validation
title_full Comparing Genomic Prediction Models by Means of Cross Validation
title_fullStr Comparing Genomic Prediction Models by Means of Cross Validation
title_full_unstemmed Comparing Genomic Prediction Models by Means of Cross Validation
title_short Comparing Genomic Prediction Models by Means of Cross Validation
title_sort comparing genomic prediction models by means of cross validation
topic genomic selection
cross validation
plant breeding
genomic models
model selection
url https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/full
work_keys_str_mv AT matiasfschrauf comparinggenomicpredictionmodelsbymeansofcrossvalidation
AT matiasfschrauf comparinggenomicpredictionmodelsbymeansofcrossvalidation
AT gustavodeloscampos comparinggenomicpredictionmodelsbymeansofcrossvalidation
AT sebastianmunilla comparinggenomicpredictionmodelsbymeansofcrossvalidation
AT sebastianmunilla comparinggenomicpredictionmodelsbymeansofcrossvalidation