Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods

ObjectiveAccurate prediction of changes in sugarcane yield in Guangxi can provide important reference for the formulation of relevant policies by the government and provide decision-making basis for farmers to guide sugarcane planting, thereby improving sugarcane yield and quality and promoting the...

Full description

Bibliographic Details
Main Authors: SHI Jiefeng, HUANG Wei, FAN Xieyang, LI Xiuhua, LU Yangxu, JIANG Zhuhui, WANG Zeping, LUO Wei, ZHANG Muqing
Format: Article
Language:English
Published: Editorial Office of Smart Agriculture 2023-06-01
Series:智慧农业
Subjects:
Online Access:http://www.smartag.net.cn/CN/10.12133/j.smartag.SA202304004
_version_ 1797754819933372416
author SHI Jiefeng
HUANG Wei
FAN Xieyang
LI Xiuhua
LU Yangxu
JIANG Zhuhui
WANG Zeping
LUO Wei
ZHANG Muqing
author_facet SHI Jiefeng
HUANG Wei
FAN Xieyang
LI Xiuhua
LU Yangxu
JIANG Zhuhui
WANG Zeping
LUO Wei
ZHANG Muqing
author_sort SHI Jiefeng
collection DOAJ
description ObjectiveAccurate prediction of changes in sugarcane yield in Guangxi can provide important reference for the formulation of relevant policies by the government and provide decision-making basis for farmers to guide sugarcane planting, thereby improving sugarcane yield and quality and promoting the development of the sugarcane industry. This research was conducted to provide scientific data support for sugar factories and related management departments, explore the relationship between sugarcane yield and meteorological factors in the main sugarcane producing areas of Guangxi Zhuang Autonomous Region.MethodsThe study area included five sugarcane planting regions which laid in five different counties in Guangxi, China. The average yields per hectare of each planting regions were provided by Guangxi Sugar Industry Group which controls the sugar refineries of each planting region. The daily meteorological data including 14 meteorological factors from 2002 to 2019 were acquired from National Data Center for Meteorological Sciences to analyze their influences placed on sugarcane yield. Since meteorological factors could pose different influences on sugarcane growth during different time spans, a new kind of factor which includes meteorological factors and time spans was defined, such as the average precipitation in August, the average temperature from February to April, etc. And then the inter-correlation of all the meteorological factors of different time spans and their correlations with yields were analyzed to screen out the key meteorological factors of sensitive time spans. After that, four algorithms of BP neural network (BPNN), support vector machine (SVM), random forest (RF), and long short-term memory (LSTM) were employed to establish sugarcane apparent yield prediction models for each planting region. Their corresponding reference models based on the annual meteorological factors were also built. Additionally, the meteorological yields of every planting region were extracted by HP filtering, and a general meteorological yield prediction model was built based on the data of all the five planting regions by using RF, SVM BPNN, and LSTM, respectively.Results and DiscussionsThe correlation analysis showed that different planting regions have different sensitive meteorological factors and key time spans. The highly representative meteorological factors mainly included sunshine hours, precipitation, and atmospheric pressure. According to the results of correlation analysis, in Region 1, the highest negative correlation coefficient with yield was observed at the sunshine hours during October and November, while the highest positive correlation coefficient was found at the minimum relative humidity in November. In Region 2, the maximum positive correlation coefficient with yield was observed at the average vapor pressure during February and March, whereas the maximum negative correlation coefficient was associated with the precipitation in August and September. In Region 3, the maximum positive correlation coefficient with yield was found at the 20‒20 precipitation during August and September, while the maximum negative correlation coefficient was related to sunshine hours in the same period. In Region 4, the maximum positive correlation coefficient with yield was observed at the 20‒20 precipitation from March to December, whereas the maximum negative correlation coefficient was associated with the highest atmospheric pressure from August to December. In Region 5, the maximum positive correlation coefficient with yield was found at the average vapor pressure from June and to August, whereas the maximum negative correlation coefficient as related to the lowest atmospheric pressure in February and March. For each specific planting region, the accuracy of apparent yield prediction model based on sensitive meteorological factors during key time spans was obviously better than that based on the annual average meteorological values. The LSTM model performed significantly better than the widely used classic BPNN, SVM, and RF models for both kinds of meteorological factors (under sensitive time spans or annually). The overall root mean square error (RMSE) and mean absolute percentage error (MAPE) of the LSTM model under key time spans were 10.34 t/ha and 6.85%, respectively, with a coefficient of determination Rv2 of 0.8489 between the predicted values and true values. For the general prediction models of the meteorological yield to multiple the sugarcane planting regions, the RF, SVM, and BPNN models achieved good results, and the best prediction performance went to BPNN model, with an RMSE of 0.98 t/ha, MAPE of 9.59%, and Rv2 of 0.965. The RMSE and MAPE of the LSTM model were 0.25 t/ha and 39.99%, respectively, and the Rv2 was 0.77.ConclusionsSensitive meteorological factors under key time spans were found to be more significantly correlated with the yields than the annual average meteorological factors. LSTM model shows better performances on apparent yield prediction for specific planting region than the classic BPNN, SVM, and RF models, but BPNN model showed better results than other models in predicting meteorological yield over multiple sugarcane planting regions.
first_indexed 2024-03-12T17:37:58Z
format Article
id doaj.art-a7bf6058423b46fab86161970aef8b24
institution Directory Open Access Journal
issn 2096-8094
language English
last_indexed 2024-03-12T17:37:58Z
publishDate 2023-06-01
publisher Editorial Office of Smart Agriculture
record_format Article
series 智慧农业
spelling doaj.art-a7bf6058423b46fab86161970aef8b242023-08-04T06:21:50ZengEditorial Office of Smart Agriculture智慧农业2096-80942023-06-0152829210.12133/j.smartag.SA202304004SA202304004Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning MethodsSHI Jiefeng0HUANG Wei1FAN Xieyang2LI Xiuhua3LU Yangxu4JIANG Zhuhui5WANG Zeping6LUO Wei7ZHANG Muqing8School of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaGuangxi Sugar Industry Group, Nanning 530022, ChinaSugarcane Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaGuangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, ChinaObjectiveAccurate prediction of changes in sugarcane yield in Guangxi can provide important reference for the formulation of relevant policies by the government and provide decision-making basis for farmers to guide sugarcane planting, thereby improving sugarcane yield and quality and promoting the development of the sugarcane industry. This research was conducted to provide scientific data support for sugar factories and related management departments, explore the relationship between sugarcane yield and meteorological factors in the main sugarcane producing areas of Guangxi Zhuang Autonomous Region.MethodsThe study area included five sugarcane planting regions which laid in five different counties in Guangxi, China. The average yields per hectare of each planting regions were provided by Guangxi Sugar Industry Group which controls the sugar refineries of each planting region. The daily meteorological data including 14 meteorological factors from 2002 to 2019 were acquired from National Data Center for Meteorological Sciences to analyze their influences placed on sugarcane yield. Since meteorological factors could pose different influences on sugarcane growth during different time spans, a new kind of factor which includes meteorological factors and time spans was defined, such as the average precipitation in August, the average temperature from February to April, etc. And then the inter-correlation of all the meteorological factors of different time spans and their correlations with yields were analyzed to screen out the key meteorological factors of sensitive time spans. After that, four algorithms of BP neural network (BPNN), support vector machine (SVM), random forest (RF), and long short-term memory (LSTM) were employed to establish sugarcane apparent yield prediction models for each planting region. Their corresponding reference models based on the annual meteorological factors were also built. Additionally, the meteorological yields of every planting region were extracted by HP filtering, and a general meteorological yield prediction model was built based on the data of all the five planting regions by using RF, SVM BPNN, and LSTM, respectively.Results and DiscussionsThe correlation analysis showed that different planting regions have different sensitive meteorological factors and key time spans. The highly representative meteorological factors mainly included sunshine hours, precipitation, and atmospheric pressure. According to the results of correlation analysis, in Region 1, the highest negative correlation coefficient with yield was observed at the sunshine hours during October and November, while the highest positive correlation coefficient was found at the minimum relative humidity in November. In Region 2, the maximum positive correlation coefficient with yield was observed at the average vapor pressure during February and March, whereas the maximum negative correlation coefficient was associated with the precipitation in August and September. In Region 3, the maximum positive correlation coefficient with yield was found at the 20‒20 precipitation during August and September, while the maximum negative correlation coefficient was related to sunshine hours in the same period. In Region 4, the maximum positive correlation coefficient with yield was observed at the 20‒20 precipitation from March to December, whereas the maximum negative correlation coefficient was associated with the highest atmospheric pressure from August to December. In Region 5, the maximum positive correlation coefficient with yield was found at the average vapor pressure from June and to August, whereas the maximum negative correlation coefficient as related to the lowest atmospheric pressure in February and March. For each specific planting region, the accuracy of apparent yield prediction model based on sensitive meteorological factors during key time spans was obviously better than that based on the annual average meteorological values. The LSTM model performed significantly better than the widely used classic BPNN, SVM, and RF models for both kinds of meteorological factors (under sensitive time spans or annually). The overall root mean square error (RMSE) and mean absolute percentage error (MAPE) of the LSTM model under key time spans were 10.34 t/ha and 6.85%, respectively, with a coefficient of determination Rv2 of 0.8489 between the predicted values and true values. For the general prediction models of the meteorological yield to multiple the sugarcane planting regions, the RF, SVM, and BPNN models achieved good results, and the best prediction performance went to BPNN model, with an RMSE of 0.98 t/ha, MAPE of 9.59%, and Rv2 of 0.965. The RMSE and MAPE of the LSTM model were 0.25 t/ha and 39.99%, respectively, and the Rv2 was 0.77.ConclusionsSensitive meteorological factors under key time spans were found to be more significantly correlated with the yields than the annual average meteorological factors. LSTM model shows better performances on apparent yield prediction for specific planting region than the classic BPNN, SVM, and RF models, but BPNN model showed better results than other models in predicting meteorological yield over multiple sugarcane planting regions.http://www.smartag.net.cn/CN/10.12133/j.smartag.SA202304004meteorological factorhp filtersugarcane yieldbpnn modellstm modelmachine learning
spellingShingle SHI Jiefeng
HUANG Wei
FAN Xieyang
LI Xiuhua
LU Yangxu
JIANG Zhuhui
WANG Zeping
LUO Wei
ZHANG Muqing
Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
智慧农业
meteorological factor
hp filter
sugarcane yield
bpnn model
lstm model
machine learning
title Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
title_full Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
title_fullStr Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
title_full_unstemmed Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
title_short Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods
title_sort yield prediction models in guangxi sugarcane planting regions based on machine learning methods
topic meteorological factor
hp filter
sugarcane yield
bpnn model
lstm model
machine learning
url http://www.smartag.net.cn/CN/10.12133/j.smartag.SA202304004
work_keys_str_mv AT shijiefeng yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT huangwei yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT fanxieyang yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT lixiuhua yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT luyangxu yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT jiangzhuhui yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT wangzeping yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT luowei yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods
AT zhangmuqing yieldpredictionmodelsinguangxisugarcaneplantingregionsbasedonmachinelearningmethods