Prediction of rice biomass using machine learning algorithms

Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based...

Full description

Bibliographic Details
Main Author:	Radhwane, Derraz
Format:	Thesis
Language:	English English
Published:	2022
Subjects:	Rice - Yields Machine learning Plant biomass
Online Access:	http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf

_version_	1825939057583063040
author	Radhwane, Derraz
author_facet	Radhwane, Derraz
author_sort	Radhwane, Derraz
collection	UPM
description	Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based vegetation indices (VIs). Nevertheless, VIs are highly collinear, noisy, and their large dataset collection is expensive. These issues affect the MLs' model performance, stability (under/overfitting), variance, and confidence. This study aims to: (i) compare the base and ensemble MLs’ model performance, variance, stability, and confidence for predicting rice biomass using collinear (multicollinearity context (MCC)) and non-collinear (non-multicollinearity context (NMCC)) VIs; (ii) compare the rice above ground biomass (TAGB) predictability from noised and Kalman filter’ denoised VIs using histogram gradient boosting regressor (HGBR); (iii) develop a trigonometric-Euclidean-smoother interpolator (TESI), including linear (LN-TESI), cubic (C-TESI), quadratic (Q-TESI), and logarithmic (L-TESI) interpolators, for continuous time-series and non-timeseries VIs data augmentation, and compare them to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN) for preventing DNN’s under/overfitting. A split-plot randomised complete block design (RCBD) experiment was conducted in a rice granary at Terengganu, Malaysia, with 120 quadrants. Each quadrant provides five rice biomass traits during the tillering, booting, and milking stages. A MicaSense Red- Edge multispectral camera mounted on a DJI quadcopter drone was used to acquire the blue, green, red, red-edge, and NIR bands to extract the VIs values corresponding to each quadrant. Besides the biomass dataset, the non-timeseries fertiliser dataset and the time-series oil palm and rice datasets were also collected to validate the TESI, TVAE, and CTGAN results. For the first objective, the MLs model performance and stability were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of coefficient of determination (R2) and root mean squared error (RMSE) variances in MCC and NMCC. Multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. For the second objective, the denoised VIs (R2 = 0.74-0.95, RMSE = 2.43–13.94 g q-1) outperformed the noised VIs (R2 = 0.63-0.90, RMSE = 3.28–17.91 g q-1) for the TAGB prediction. The denoised VIs achieved the highest R2 and lowest RMSE values at the booting stage (R2 = 0.93-0.95, RMSE = 8.22-9.30 g q-1), then tillering (R2 = 0.75-0.84, RMSE = 2.43-2.96 g q-1), and then milking stages (R2 = 0.74-0.80, RMSE = 13.34-13.94 g q-1). The HGBR achieved the lowest overfitting on the denoised VIs at the booting stage with a training-testing R2’s change (ΔR2) of 0.02-0.09 and a training-testing RMSE’s change (ΔRMSE) of 1.93-6.54 q-1, tillering (ΔR2 = 0.08-0.21, ΔRMSE = 1.23-2.36 g q-1), and then milking stages (ΔR2 = 0.14-0.25, ΔRMSE = 5.57-10.02 g q-1). For the third objective, the TESI, TVAE, and CTGAN were applied to increase the four datasets’ sizes. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81-0.99 on the four datasets. The DNN achieved the highest R2 (0.77–0.99) and lowest RMSE ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under- and overfitting. Overall, as most of the agronomic research is conducted based on a few sensors’ bands, vegetation indices are highly collinear. Therefore, exploring the multilevel sensitivity of different MLs to multicollinearity may address the methodological choices of several future agronomic studies. Additionally, stable VI-biomass models accurately reflect rice yield potential, which may be significantly improved by VIs' denoising. Further, the Q-TESI, C-TESI, and LTESI minimise the proportionality of interpolation error to the square of the distance between the data points compared to the LN-TESI. Consequently, the Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. Furthermore, they intensify non-time series zonal, synthetic sampling, which reduces sampling labour.
first_indexed	2024-12-09T02:17:12Z
format	Thesis
id	upm.eprints-104544
institution	Universiti Putra Malaysia
language	English English
last_indexed	2024-12-09T02:17:12Z
publishDate	2022
record_format	dspace
spelling	upm.eprints-1045442024-09-27T07:13:04Z http://psasir.upm.edu.my/id/eprint/104544/ Prediction of rice biomass using machine learning algorithms Radhwane, Derraz Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based vegetation indices (VIs). Nevertheless, VIs are highly collinear, noisy, and their large dataset collection is expensive. These issues affect the MLs' model performance, stability (under/overfitting), variance, and confidence. This study aims to: (i) compare the base and ensemble MLs’ model performance, variance, stability, and confidence for predicting rice biomass using collinear (multicollinearity context (MCC)) and non-collinear (non-multicollinearity context (NMCC)) VIs; (ii) compare the rice above ground biomass (TAGB) predictability from noised and Kalman filter’ denoised VIs using histogram gradient boosting regressor (HGBR); (iii) develop a trigonometric-Euclidean-smoother interpolator (TESI), including linear (LN-TESI), cubic (C-TESI), quadratic (Q-TESI), and logarithmic (L-TESI) interpolators, for continuous time-series and non-timeseries VIs data augmentation, and compare them to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN) for preventing DNN’s under/overfitting. A split-plot randomised complete block design (RCBD) experiment was conducted in a rice granary at Terengganu, Malaysia, with 120 quadrants. Each quadrant provides five rice biomass traits during the tillering, booting, and milking stages. A MicaSense Red- Edge multispectral camera mounted on a DJI quadcopter drone was used to acquire the blue, green, red, red-edge, and NIR bands to extract the VIs values corresponding to each quadrant. Besides the biomass dataset, the non-timeseries fertiliser dataset and the time-series oil palm and rice datasets were also collected to validate the TESI, TVAE, and CTGAN results. For the first objective, the MLs model performance and stability were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of coefficient of determination (R2) and root mean squared error (RMSE) variances in MCC and NMCC. Multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. For the second objective, the denoised VIs (R2 = 0.74-0.95, RMSE = 2.43–13.94 g q-1) outperformed the noised VIs (R2 = 0.63-0.90, RMSE = 3.28–17.91 g q-1) for the TAGB prediction. The denoised VIs achieved the highest R2 and lowest RMSE values at the booting stage (R2 = 0.93-0.95, RMSE = 8.22-9.30 g q-1), then tillering (R2 = 0.75-0.84, RMSE = 2.43-2.96 g q-1), and then milking stages (R2 = 0.74-0.80, RMSE = 13.34-13.94 g q-1). The HGBR achieved the lowest overfitting on the denoised VIs at the booting stage with a training-testing R2’s change (ΔR2) of 0.02-0.09 and a training-testing RMSE’s change (ΔRMSE) of 1.93-6.54 q-1, tillering (ΔR2 = 0.08-0.21, ΔRMSE = 1.23-2.36 g q-1), and then milking stages (ΔR2 = 0.14-0.25, ΔRMSE = 5.57-10.02 g q-1). For the third objective, the TESI, TVAE, and CTGAN were applied to increase the four datasets’ sizes. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81-0.99 on the four datasets. The DNN achieved the highest R2 (0.77–0.99) and lowest RMSE ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under- and overfitting. Overall, as most of the agronomic research is conducted based on a few sensors’ bands, vegetation indices are highly collinear. Therefore, exploring the multilevel sensitivity of different MLs to multicollinearity may address the methodological choices of several future agronomic studies. Additionally, stable VI-biomass models accurately reflect rice yield potential, which may be significantly improved by VIs' denoising. Further, the Q-TESI, C-TESI, and LTESI minimise the proportionality of interpolation error to the square of the distance between the data points compared to the LN-TESI. Consequently, the Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. Furthermore, they intensify non-time series zonal, synthetic sampling, which reduces sampling labour. 2022-12 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf Radhwane, Derraz (2022) Prediction of rice biomass using machine learning algorithms. Doctoral thesis, Universiti Putra Malaysia. Rice - Yields Machine learning Plant biomass English
spellingShingle	Rice - Yields Machine learning Plant biomass Radhwane, Derraz Prediction of rice biomass using machine learning algorithms
title	Prediction of rice biomass using machine learning algorithms
title_full	Prediction of rice biomass using machine learning algorithms
title_fullStr	Prediction of rice biomass using machine learning algorithms
title_full_unstemmed	Prediction of rice biomass using machine learning algorithms
title_short	Prediction of rice biomass using machine learning algorithms
title_sort	prediction of rice biomass using machine learning algorithms
topic	Rice - Yields Machine learning Plant biomass
url	http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf
work_keys_str_mv	AT radhwanederraz predictionofricebiomassusingmachinelearningalgorithms

Prediction of rice biomass using machine learning algorithms

Similar Items