Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity

Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield. Area of study: Fars provi...

Full description

Bibliographic Details
Main Authors: Ali BEHPOURI, Sara FAROKHZADEH, Zahra ZINATI, Zobeir KHOSRAVI
Format: Article
Language:English
Published: Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria 2023-02-01
Series:Spanish Journal of Agricultural Research
Subjects:
Online Access:https://revistas.inia.es/index.php/sjar/article/view/19835
_version_ 1811158130796527616
author Ali BEHPOURI
Sara FAROKHZADEH
Zahra ZINATI
Zobeir KHOSRAVI
author_facet Ali BEHPOURI
Sara FAROKHZADEH
Zahra ZINATI
Zobeir KHOSRAVI
author_sort Ali BEHPOURI
collection DOAJ
description Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield. Area of study: Fars province, Iran. Material and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits. Main results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits. Research highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs.
first_indexed 2024-04-10T05:18:09Z
format Article
id doaj.art-6d1912dc9eaa4e3c82a34db1a66688ba
institution Directory Open Access Journal
issn 2171-9292
language English
last_indexed 2024-04-10T05:18:09Z
publishDate 2023-02-01
publisher Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria
record_format Article
series Spanish Journal of Agricultural Research
spelling doaj.art-6d1912dc9eaa4e3c82a34db1a66688ba2023-03-08T13:03:03ZengInstituto Nacional de Investigación y Tecnología Agraria y AlimentariaSpanish Journal of Agricultural Research2171-92922023-02-0121110.5424/sjar/2023211-19835Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity Ali BEHPOURI0Sara FAROKHZADEH1Zahra ZINATI2Zobeir KHOSRAVI3Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, IranDepartment of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, IranDepartment of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, IranDepartment of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Iran Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield. Area of study: Fars province, Iran. Material and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits. Main results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits. Research highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs. https://revistas.inia.es/index.php/sjar/article/view/19835Triticum aestivummultivariate statistical analysispartial least squares regressionsupport vector regression
spellingShingle Ali BEHPOURI
Sara FAROKHZADEH
Zahra ZINATI
Zobeir KHOSRAVI
Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
Spanish Journal of Agricultural Research
Triticum aestivum
multivariate statistical analysis
partial least squares regression
support vector regression
title Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
title_full Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
title_fullStr Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
title_full_unstemmed Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
title_short Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
title_sort use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity
topic Triticum aestivum
multivariate statistical analysis
partial least squares regression
support vector regression
url https://revistas.inia.es/index.php/sjar/article/view/19835
work_keys_str_mv AT alibehpouri useofmultivariateanalysisandmachinelearningmethodstocharacterizetraitscontributingtowheatyielddiversity
AT sarafarokhzadeh useofmultivariateanalysisandmachinelearningmethodstocharacterizetraitscontributingtowheatyielddiversity
AT zahrazinati useofmultivariateanalysisandmachinelearningmethodstocharacterizetraitscontributingtowheatyielddiversity
AT zobeirkhosravi useofmultivariateanalysisandmachinelearningmethodstocharacterizetraitscontributingtowheatyielddiversity