An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction

Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent predicti...

Full description

Bibliographic Details
Main Authors: Lu Yu, Chunxue Wu, Neal N. Xiong
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/4/652
Description
Summary:Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent prediction system to accurately predict the future POPs concentrations in advance. This work has extremely important significance for policy formulation, human health, environmental protection and the sustainable development of society. Since the POPs concentrations sequence contains both linear and nonlinear components, this paper proposes an intelligent data analysis system combining autoregressive integrated moving average (ARIMA) and long short-term memory network (LSTM) to analyze and predict the POPs concentrations in the Great Lakes region. ARIMA is used to capture linear components while LSTM is used to process nonlinear components, which overcomes the deficiency of single models. Moreover, a one-class SVM algorithm is used to detect outliers during data preprocessing. Bayesian information criterion and grid search methods are also used to obtain the optimal parameter combinations of ARIMA and LSTM, respectively. This paper compares our intelligent data analysis system with other single baseline models by using multiple evaluation indicators and finds that our system has the smallest MAE, RMSE and SMAPE values on all datasets. Meanwhile, our system can predict the trends of concentration changes well and the predicted values are closer to true values, which prove that it can effectively improve the precision of prediction. Finally, our system is used to predict concentration values of sites in the Great Lakes region in the next 5 years. The predicted concentrations present a large fluctuation trend in each year, but the overall trend is downward.
ISSN:2079-9292