Genomic Prediction of Wheat Grain Yield Using Machine Learning

Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be...

Full description

Bibliographic Details
Main Authors:	Manisha Sanjay Sirsat, Paula Rodrigues Oblessuc, Ricardo S. Ramiro
Format:	Article
Language:	English
Published:	MDPI AG 2022-09-01
Series:	Agriculture
Subjects:	genomic prediction machine learning random forests gradient boosting Bayesian methods penalized regression
Online Access:	https://www.mdpi.com/2077-0472/12/9/1406

_version_	1797492322927116288
author	Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro
author_facet	Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro
author_sort	Manisha Sanjay Sirsat
collection	DOAJ
description	Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models.
first_indexed	2024-03-10T01:02:00Z
format	Article
id	doaj.art-e11c43566c274471b37f02b0915b3d18
institution	Directory Open Access Journal
issn	2077-0472
language	English
last_indexed	2024-03-10T01:02:00Z
publishDate	2022-09-01
publisher	MDPI AG
record_format	Article
series	Agriculture
spelling	doaj.art-e11c43566c274471b37f02b0915b3d182023-11-23T14:33:25ZengMDPI AGAgriculture2077-04722022-09-01129140610.3390/agriculture12091406Genomic Prediction of Wheat Grain Yield Using Machine LearningManisha Sanjay Sirsat0Paula Rodrigues Oblessuc1Ricardo S. Ramiro2Department of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Protection of Specific Crops, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalGenomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models.https://www.mdpi.com/2077-0472/12/9/1406genomic predictionmachine learningrandom forestsgradient boostingBayesian methodspenalized regression
spellingShingle	Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro Genomic Prediction of Wheat Grain Yield Using Machine Learning Agriculture genomic prediction machine learning random forests gradient boosting Bayesian methods penalized regression
title	Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_full	Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_fullStr	Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_full_unstemmed	Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_short	Genomic Prediction of Wheat Grain Yield Using Machine Learning
title_sort	genomic prediction of wheat grain yield using machine learning
topic	genomic prediction machine learning random forests gradient boosting Bayesian methods penalized regression
url	https://www.mdpi.com/2077-0472/12/9/1406
work_keys_str_mv	AT manishasanjaysirsat genomicpredictionofwheatgrainyieldusingmachinelearning AT paularodriguesoblessuc genomicpredictionofwheatgrainyieldusingmachinelearning AT ricardosramiro genomicpredictionofwheatgrainyieldusingmachinelearning

Genomic Prediction of Wheat Grain Yield Using Machine Learning

Similar Items