Summary: | Machine learning is of great value for the situation analysis and scientific prevention and control of soil heavy metal pollution risk. In this paper, taking the selenium rich area as the research object, the improved Genetic Algorithm (GA)–Back Propagation (BP) algorithm was used to construct the risk assessment model of Cd pollution in this area. Firstly, the content of Cd and Se in the soil of the study area was statistically analyzed based on descriptive statistics and correlation analysis. Then, a three-layer BP neural network structure was designed and optimized by GA algorithm. The individual coding length was calculated by connecting weights and thresholds of Cd and Se elements. Based on 97 groups of field data in this area, the experimental results show that the BP model optimized by GA has faster convergence speed, maintains good generalization ability on the test sample points. Compared with multiple linear regression model (MLRM), GA-BP reduces RMSE by 64.84, 52.12, 49.53, and 63.18% compared with M5. The accuracy of estimating Cd pollution status in different areas by GA-BP neural network model is higher than the other three regression models on the whole. In the whole research region, the samples in the safe interval, relatively safe interval, light pollution interval, moderate pollution interval and severe pollution interval accounted for 4.12, 8.24, 42.26, 17.52 and 27.86%, respectively, and the prediction results of soil Cd pollution level showed that only 12.36% of the samples were in a safe state without the risk of Cd pollution, while most of the samples were in a mild state. Because of the huge potential of carbon sequestration and emission reduction in agriculture, planting se-rich and Cd-low crops in these areas can not only promote the development of local Se-rich industries but also achieve carbon sequestration and emission reduction.
|