Genomic Prediction of Wheat Grain Yield Using Machine Learning

Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be...

Full description

Bibliographic Details
Main Authors: Manisha Sanjay Sirsat, Paula Rodrigues Oblessuc, Ricardo S. Ramiro
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Agriculture
Subjects:
Online Access:https://www.mdpi.com/2077-0472/12/9/1406
_version_ 1797492322927116288
author Manisha Sanjay Sirsat
Paula Rodrigues Oblessuc
Ricardo S. Ramiro
author_facet Manisha Sanjay Sirsat
Paula Rodrigues Oblessuc
Ricardo S. Ramiro
author_sort Manisha Sanjay Sirsat
collection DOAJ
description Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models.
first_indexed 2024-03-10T01:02:00Z
format Article
id doaj.art-e11c43566c274471b37f02b0915b3d18
institution Directory Open Access Journal
issn 2077-0472
language English
last_indexed 2024-03-10T01:02:00Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Agriculture
spelling doaj.art-e11c43566c274471b37f02b0915b3d182023-11-23T14:33:25ZengMDPI AGAgriculture2077-04722022-09-01129140610.3390/agriculture12091406Genomic Prediction of Wheat Grain Yield Using Machine LearningManisha Sanjay Sirsat0Paula Rodrigues Oblessuc1Ricardo S. Ramiro2Department of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Protection of Specific Crops, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalGenomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models.https://www.mdpi.com/2077-0472/12/9/1406genomic predictionmachine learningrandom forestsgradient boostingBayesian methodspenalized regression
spellingShingle Manisha Sanjay Sirsat
Paula Rodrigues Oblessuc
Ricardo S. Ramiro
Genomic Prediction of Wheat Grain Yield Using Machine Learning
Agriculture
genomic prediction
machine learning
random forests
gradient boosting
Bayesian methods
penalized regression
title Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_full Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_fullStr Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_full_unstemmed Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_short Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_sort genomic prediction of wheat grain yield using machine learning
topic genomic prediction
machine learning
random forests
gradient boosting
Bayesian methods
penalized regression
url https://www.mdpi.com/2077-0472/12/9/1406
work_keys_str_mv AT manishasanjaysirsat genomicpredictionofwheatgrainyieldusingmachinelearning
AT paularodriguesoblessuc genomicpredictionofwheatgrainyieldusingmachinelearning
AT ricardosramiro genomicpredictionofwheatgrainyieldusingmachinelearning