Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction

Peaches (<i>Prunus persica</i> (L.) Batsch) are a popular fruit in Europe and Croatia. Maturity at harvest has a crucial influence on peach fruit quality, storage life, and consequently consumer acceptance. The main goal of this study is to develop a machine learning model that will dete...

Full description

Bibliographic Details
Main Authors: Dejan Ljubobratović, Marko Vuković, Marija Brkić Bakarić, Tomislav Jemrić, Maja Matetić
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/24/3115
_version_ 1797505196081807360
author Dejan Ljubobratović
Marko Vuković
Marija Brkić Bakarić
Tomislav Jemrić
Maja Matetić
author_facet Dejan Ljubobratović
Marko Vuković
Marija Brkić Bakarić
Tomislav Jemrić
Maja Matetić
author_sort Dejan Ljubobratović
collection DOAJ
description Peaches (<i>Prunus persica</i> (L.) Batsch) are a popular fruit in Europe and Croatia. Maturity at harvest has a crucial influence on peach fruit quality, storage life, and consequently consumer acceptance. The main goal of this study is to develop a machine learning model that will detect the most important features for predicting peach maturity by first training models and then using the importance ratings of these models to detect nonlinear (and linear) relationships. Thus, the most important peach features at a given stage of its ripening could be revealed. To date, this method has not been used for this purpose, and at the same time, it has the potential to be applied to other similar peach varieties. A total of 33 fruit features are measured on the harvested peaches, and three imbalanced datasets are created using firmness thresholds of 1.84, 3.57, and 4.59 kg·cm<sup>−2</sup>. These datasets are balanced using the SMOTE and ROSE techniques, and the Random Forest machine learning model is trained on them. Permutation Feature Importance (PFI), Variable Importance (VI), and LIME interpretability methods are used to detect variables that most influence predictions in the given machine learning models. PFI shows that the <i>h°</i> and <i>a</i>* ground color parameters, COL ground color index, SSC/TA, and TA inner quality parameters are among the top ten most contributing variables in all three models. Meanwhile, VI shows that this is the case for the <i>a</i>* ground color parameter, COL and CCL ground color indexes, and the SSC/TA inner quality parameter. The fruit flesh ratio is highly positioned (among the top three according to PFI) in two models, but it is not even among the top ten in the third.
first_indexed 2024-03-10T04:15:09Z
format Article
id doaj.art-8f815f40883a4e6eafd48ed5c7db8a14
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T04:15:09Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-8f815f40883a4e6eafd48ed5c7db8a142023-11-23T08:02:19ZengMDPI AGElectronics2079-92922021-12-011024311510.3390/electronics10243115Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity PredictionDejan Ljubobratović0Marko Vuković1Marija Brkić Bakarić2Tomislav Jemrić3Maja Matetić4Department of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, CroatiaFaculty of Agriculture, Unit of Horticulture and Landscape Architecture, Department of Pomology, University of Zagreb, Svetošimunska c. 25, 10000 Zagreb, CroatiaDepartment of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, CroatiaFaculty of Agriculture, Unit of Horticulture and Landscape Architecture, Department of Pomology, University of Zagreb, Svetošimunska c. 25, 10000 Zagreb, CroatiaDepartment of Informatics, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, CroatiaPeaches (<i>Prunus persica</i> (L.) Batsch) are a popular fruit in Europe and Croatia. Maturity at harvest has a crucial influence on peach fruit quality, storage life, and consequently consumer acceptance. The main goal of this study is to develop a machine learning model that will detect the most important features for predicting peach maturity by first training models and then using the importance ratings of these models to detect nonlinear (and linear) relationships. Thus, the most important peach features at a given stage of its ripening could be revealed. To date, this method has not been used for this purpose, and at the same time, it has the potential to be applied to other similar peach varieties. A total of 33 fruit features are measured on the harvested peaches, and three imbalanced datasets are created using firmness thresholds of 1.84, 3.57, and 4.59 kg·cm<sup>−2</sup>. These datasets are balanced using the SMOTE and ROSE techniques, and the Random Forest machine learning model is trained on them. Permutation Feature Importance (PFI), Variable Importance (VI), and LIME interpretability methods are used to detect variables that most influence predictions in the given machine learning models. PFI shows that the <i>h°</i> and <i>a</i>* ground color parameters, COL ground color index, SSC/TA, and TA inner quality parameters are among the top ten most contributing variables in all three models. Meanwhile, VI shows that this is the case for the <i>a</i>* ground color parameter, COL and CCL ground color indexes, and the SSC/TA inner quality parameter. The fruit flesh ratio is highly positioned (among the top three according to PFI) in two models, but it is not even among the top ten in the third.https://www.mdpi.com/2079-9292/10/24/3115machine learningimbalanced datasetspeach maturityvariable importanceinterpretable machine learningrandom forest
spellingShingle Dejan Ljubobratović
Marko Vuković
Marija Brkić Bakarić
Tomislav Jemrić
Maja Matetić
Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
Electronics
machine learning
imbalanced datasets
peach maturity
variable importance
interpretable machine learning
random forest
title Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
title_full Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
title_fullStr Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
title_full_unstemmed Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
title_short Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
title_sort utilization of explainable machine learning algorithms for determination of important features in suncrest peach maturity prediction
topic machine learning
imbalanced datasets
peach maturity
variable importance
interpretable machine learning
random forest
url https://www.mdpi.com/2079-9292/10/24/3115
work_keys_str_mv AT dejanljubobratovic utilizationofexplainablemachinelearningalgorithmsfordeterminationofimportantfeaturesinsuncrestpeachmaturityprediction
AT markovukovic utilizationofexplainablemachinelearningalgorithmsfordeterminationofimportantfeaturesinsuncrestpeachmaturityprediction
AT marijabrkicbakaric utilizationofexplainablemachinelearningalgorithmsfordeterminationofimportantfeaturesinsuncrestpeachmaturityprediction
AT tomislavjemric utilizationofexplainablemachinelearningalgorithmsfordeterminationofimportantfeaturesinsuncrestpeachmaturityprediction
AT majamatetic utilizationofexplainablemachinelearningalgorithmsfordeterminationofimportantfeaturesinsuncrestpeachmaturityprediction