Prediction of rice biomass using machine learning algorithms

Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based...

Full description

Bibliographic Details
Main Author: Radhwane, Derraz
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf
_version_ 1825939057583063040
author Radhwane, Derraz
author_facet Radhwane, Derraz
author_sort Radhwane, Derraz
collection UPM
description Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based vegetation indices (VIs). Nevertheless, VIs are highly collinear, noisy, and their large dataset collection is expensive. These issues affect the MLs' model performance, stability (under/overfitting), variance, and confidence. This study aims to: (i) compare the base and ensemble MLs’ model performance, variance, stability, and confidence for predicting rice biomass using collinear (multicollinearity context (MCC)) and non-collinear (non-multicollinearity context (NMCC)) VIs; (ii) compare the rice above ground biomass (TAGB) predictability from noised and Kalman filter’ denoised VIs using histogram gradient boosting regressor (HGBR); (iii) develop a trigonometric-Euclidean-smoother interpolator (TESI), including linear (LN-TESI), cubic (C-TESI), quadratic (Q-TESI), and logarithmic (L-TESI) interpolators, for continuous time-series and non-timeseries VIs data augmentation, and compare them to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN) for preventing DNN’s under/overfitting. A split-plot randomised complete block design (RCBD) experiment was conducted in a rice granary at Terengganu, Malaysia, with 120 quadrants. Each quadrant provides five rice biomass traits during the tillering, booting, and milking stages. A MicaSense Red- Edge multispectral camera mounted on a DJI quadcopter drone was used to acquire the blue, green, red, red-edge, and NIR bands to extract the VIs values corresponding to each quadrant. Besides the biomass dataset, the non-timeseries fertiliser dataset and the time-series oil palm and rice datasets were also collected to validate the TESI, TVAE, and CTGAN results. For the first objective, the MLs model performance and stability were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of coefficient of determination (R2) and root mean squared error (RMSE) variances in MCC and NMCC. Multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. For the second objective, the denoised VIs (R2 = 0.74-0.95, RMSE = 2.43–13.94 g q-1) outperformed the noised VIs (R2 = 0.63-0.90, RMSE = 3.28–17.91 g q-1) for the TAGB prediction. The denoised VIs achieved the highest R2 and lowest RMSE values at the booting stage (R2 = 0.93-0.95, RMSE = 8.22-9.30 g q-1), then tillering (R2 = 0.75-0.84, RMSE = 2.43-2.96 g q-1), and then milking stages (R2 = 0.74-0.80, RMSE = 13.34-13.94 g q-1). The HGBR achieved the lowest overfitting on the denoised VIs at the booting stage with a training-testing R2’s change (ΔR2) of 0.02-0.09 and a training-testing RMSE’s change (ΔRMSE) of 1.93-6.54 q-1, tillering (ΔR2 = 0.08-0.21, ΔRMSE = 1.23-2.36 g q-1), and then milking stages (ΔR2 = 0.14-0.25, ΔRMSE = 5.57-10.02 g q-1). For the third objective, the TESI, TVAE, and CTGAN were applied to increase the four datasets’ sizes. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81-0.99 on the four datasets. The DNN achieved the highest R2 (0.77–0.99) and lowest RMSE ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under- and overfitting. Overall, as most of the agronomic research is conducted based on a few sensors’ bands, vegetation indices are highly collinear. Therefore, exploring the multilevel sensitivity of different MLs to multicollinearity may address the methodological choices of several future agronomic studies. Additionally, stable VI-biomass models accurately reflect rice yield potential, which may be significantly improved by VIs' denoising. Further, the Q-TESI, C-TESI, and LTESI minimise the proportionality of interpolation error to the square of the distance between the data points compared to the LN-TESI. Consequently, the Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. Furthermore, they intensify non-time series zonal, synthetic sampling, which reduces sampling labour.
first_indexed 2024-12-09T02:17:12Z
format Thesis
id upm.eprints-104544
institution Universiti Putra Malaysia
language English
English
last_indexed 2024-12-09T02:17:12Z
publishDate 2022
record_format dspace
spelling upm.eprints-1045442024-09-27T07:13:04Z http://psasir.upm.edu.my/id/eprint/104544/ Prediction of rice biomass using machine learning algorithms Radhwane, Derraz Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based vegetation indices (VIs). Nevertheless, VIs are highly collinear, noisy, and their large dataset collection is expensive. These issues affect the MLs' model performance, stability (under/overfitting), variance, and confidence. This study aims to: (i) compare the base and ensemble MLs’ model performance, variance, stability, and confidence for predicting rice biomass using collinear (multicollinearity context (MCC)) and non-collinear (non-multicollinearity context (NMCC)) VIs; (ii) compare the rice above ground biomass (TAGB) predictability from noised and Kalman filter’ denoised VIs using histogram gradient boosting regressor (HGBR); (iii) develop a trigonometric-Euclidean-smoother interpolator (TESI), including linear (LN-TESI), cubic (C-TESI), quadratic (Q-TESI), and logarithmic (L-TESI) interpolators, for continuous time-series and non-timeseries VIs data augmentation, and compare them to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN) for preventing DNN’s under/overfitting. A split-plot randomised complete block design (RCBD) experiment was conducted in a rice granary at Terengganu, Malaysia, with 120 quadrants. Each quadrant provides five rice biomass traits during the tillering, booting, and milking stages. A MicaSense Red- Edge multispectral camera mounted on a DJI quadcopter drone was used to acquire the blue, green, red, red-edge, and NIR bands to extract the VIs values corresponding to each quadrant. Besides the biomass dataset, the non-timeseries fertiliser dataset and the time-series oil palm and rice datasets were also collected to validate the TESI, TVAE, and CTGAN results. For the first objective, the MLs model performance and stability were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of coefficient of determination (R2) and root mean squared error (RMSE) variances in MCC and NMCC. Multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. For the second objective, the denoised VIs (R2 = 0.74-0.95, RMSE = 2.43–13.94 g q-1) outperformed the noised VIs (R2 = 0.63-0.90, RMSE = 3.28–17.91 g q-1) for the TAGB prediction. The denoised VIs achieved the highest R2 and lowest RMSE values at the booting stage (R2 = 0.93-0.95, RMSE = 8.22-9.30 g q-1), then tillering (R2 = 0.75-0.84, RMSE = 2.43-2.96 g q-1), and then milking stages (R2 = 0.74-0.80, RMSE = 13.34-13.94 g q-1). The HGBR achieved the lowest overfitting on the denoised VIs at the booting stage with a training-testing R2’s change (ΔR2) of 0.02-0.09 and a training-testing RMSE’s change (ΔRMSE) of 1.93-6.54 q-1, tillering (ΔR2 = 0.08-0.21, ΔRMSE = 1.23-2.36 g q-1), and then milking stages (ΔR2 = 0.14-0.25, ΔRMSE = 5.57-10.02 g q-1). For the third objective, the TESI, TVAE, and CTGAN were applied to increase the four datasets’ sizes. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81-0.99 on the four datasets. The DNN achieved the highest R2 (0.77–0.99) and lowest RMSE ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under- and overfitting. Overall, as most of the agronomic research is conducted based on a few sensors’ bands, vegetation indices are highly collinear. Therefore, exploring the multilevel sensitivity of different MLs to multicollinearity may address the methodological choices of several future agronomic studies. Additionally, stable VI-biomass models accurately reflect rice yield potential, which may be significantly improved by VIs' denoising. Further, the Q-TESI, C-TESI, and LTESI minimise the proportionality of interpolation error to the square of the distance between the data points compared to the LN-TESI. Consequently, the Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. Furthermore, they intensify non-time series zonal, synthetic sampling, which reduces sampling labour. 2022-12 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf Radhwane, Derraz (2022) Prediction of rice biomass using machine learning algorithms. Doctoral thesis, Universiti Putra Malaysia. Rice - Yields Machine learning Plant biomass English
spellingShingle Rice - Yields
Machine learning
Plant biomass
Radhwane, Derraz
Prediction of rice biomass using machine learning algorithms
title Prediction of rice biomass using machine learning algorithms
title_full Prediction of rice biomass using machine learning algorithms
title_fullStr Prediction of rice biomass using machine learning algorithms
title_full_unstemmed Prediction of rice biomass using machine learning algorithms
title_short Prediction of rice biomass using machine learning algorithms
title_sort prediction of rice biomass using machine learning algorithms
topic Rice - Yields
Machine learning
Plant biomass
url http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf
work_keys_str_mv AT radhwanederraz predictionofricebiomassusingmachinelearningalgorithms