Groundwater contaminant source identification considering unknown boundary condition based on an automated machine learning surrogate

Groundwater contamination source identification (GCSI) is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs. The boundary condition generally is set as known variables in previous GCSI studies. However, in many practical cases, the boundary...

Full description

Bibliographic Details
Main Authors: Yaning Xu, Wenxi Lu, Zidong Pan, Chengming Luo, Yukun Bai, Shuwei Qiu
Format: Article
Language:English
Published: Elsevier 2024-01-01
Series:Geoscience Frontiers
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1674987123001998
Description
Summary:Groundwater contamination source identification (GCSI) is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs. The boundary condition generally is set as known variables in previous GCSI studies. However, in many practical cases, the boundary condition is complicated and cannot be estimated accurately in advance. Setting the boundary condition as known variables may seriously deviate from the actual situation and lead to distorted identification results. And the results of GCSI are affected by multiple factors, including contaminant source information, model parameters, boundary condition, etc. Therefore, if the boundary condition is not estimated accurately, other factors will also be estimated inaccurately. This study focuses on the unknown boundary condition and proposed to identify three types of unknown variables (contaminant source information, model parameters and boundary condition) innovatively. When simulation–optimization (S-O) method is applied to GCSI, the huge computational load is usually reduced by building surrogate models. However, when building surrogate models, the researchers need to select the models and optimize the hyperparameters to make the model powerful, which can be a lengthy process. The automated machine learning (AutoML) method was used to build surrogate model, which automates the model selection and hyperparameter optimization in machine learning engineering, largely reducing human operations and saving time. The accuracy of AutoML surrogate model is compared with the surrogate model used in eXtreme Gradient Boosting method (XGBoost), random forest method (RF), extra trees regressor method (ETR) and elasticnet method (EN) respectively, which are automatically selected in AutoML engineering. The results show that the surrogate model constructed by AutoML method has the best accuracy compared with the other four methods. This study provides reliable and strong support for GCSI.
ISSN:1674-9871