Genomic Prediction of Wheat Grain Yield Using Machine Learning
Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-09-01
|
Series: | Agriculture |
Subjects: | |
Online Access: | https://www.mdpi.com/2077-0472/12/9/1406 |
_version_ | 1797492322927116288 |
---|---|
author | Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro |
author_facet | Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro |
author_sort | Manisha Sanjay Sirsat |
collection | DOAJ |
description | Genomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models. |
first_indexed | 2024-03-10T01:02:00Z |
format | Article |
id | doaj.art-e11c43566c274471b37f02b0915b3d18 |
institution | Directory Open Access Journal |
issn | 2077-0472 |
language | English |
last_indexed | 2024-03-10T01:02:00Z |
publishDate | 2022-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Agriculture |
spelling | doaj.art-e11c43566c274471b37f02b0915b3d182023-11-23T14:33:25ZengMDPI AGAgriculture2077-04722022-09-01129140610.3390/agriculture12091406Genomic Prediction of Wheat Grain Yield Using Machine LearningManisha Sanjay Sirsat0Paula Rodrigues Oblessuc1Ricardo S. Ramiro2Department of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Protection of Specific Crops, InnovPlantProtect, 7350-478 Elvas, PortugalDepartment of Data Management and Risk Analysis, InnovPlantProtect, 7350-478 Elvas, PortugalGenomic Prediction (GP) is a powerful approach for inferring complex phenotypes from genetic markers. GP is critical for improving grain yield, particularly for staple crops such as wheat and rice, which are crucial to feeding the world. While machine learning (ML) models have recently started to be applied in GP, it is often unclear what are the best algorithms and how their results are affected by the feature selection (FS) methods. Here, we compared ML and deep learning (DL) algorithms with classical Bayesian approaches, across a range of different FS methods, for their performance in predicting wheat grain yield (in three datasets). Model performance was generally more affected by the prediction algorithm than the FS method. Among all models, the best performance was obtained for tree-based ML methods (random forests and gradient boosting) and for classical Bayesian methods. However, the latter was prone to fitting problems. This issue was also observed for models developed with features selected by BayesA, the only Bayesian FS method used here. Nonetheless, the three other FS methods led to models with no fitting problem but similar performance. Thus, our results indicate that the choice of prediction algorithm is more important than the choice of FS method for developing highly predictive models. Moreover, we concluded that random forests and gradient boosting algorithms generate highly predictive and robust wheat grain yield GP models.https://www.mdpi.com/2077-0472/12/9/1406genomic predictionmachine learningrandom forestsgradient boostingBayesian methodspenalized regression |
spellingShingle | Manisha Sanjay Sirsat Paula Rodrigues Oblessuc Ricardo S. Ramiro Genomic Prediction of Wheat Grain Yield Using Machine Learning Agriculture genomic prediction machine learning random forests gradient boosting Bayesian methods penalized regression |
title | Genomic Prediction of Wheat Grain Yield Using Machine Learning |
title_full | Genomic Prediction of Wheat Grain Yield Using Machine Learning |
title_fullStr | Genomic Prediction of Wheat Grain Yield Using Machine Learning |
title_full_unstemmed | Genomic Prediction of Wheat Grain Yield Using Machine Learning |
title_short | Genomic Prediction of Wheat Grain Yield Using Machine Learning |
title_sort | genomic prediction of wheat grain yield using machine learning |
topic | genomic prediction machine learning random forests gradient boosting Bayesian methods penalized regression |
url | https://www.mdpi.com/2077-0472/12/9/1406 |
work_keys_str_mv | AT manishasanjaysirsat genomicpredictionofwheatgrainyieldusingmachinelearning AT paularodriguesoblessuc genomicpredictionofwheatgrainyieldusingmachinelearning AT ricardosramiro genomicpredictionofwheatgrainyieldusingmachinelearning |