An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction

Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent predicti...

Full description

Bibliographic Details
Main Authors: Lu Yu, Chunxue Wu, Neal N. Xiong
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/4/652
_version_ 1797480907925356544
author Lu Yu
Chunxue Wu
Neal N. Xiong
author_facet Lu Yu
Chunxue Wu
Neal N. Xiong
author_sort Lu Yu
collection DOAJ
description Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent prediction system to accurately predict the future POPs concentrations in advance. This work has extremely important significance for policy formulation, human health, environmental protection and the sustainable development of society. Since the POPs concentrations sequence contains both linear and nonlinear components, this paper proposes an intelligent data analysis system combining autoregressive integrated moving average (ARIMA) and long short-term memory network (LSTM) to analyze and predict the POPs concentrations in the Great Lakes region. ARIMA is used to capture linear components while LSTM is used to process nonlinear components, which overcomes the deficiency of single models. Moreover, a one-class SVM algorithm is used to detect outliers during data preprocessing. Bayesian information criterion and grid search methods are also used to obtain the optimal parameter combinations of ARIMA and LSTM, respectively. This paper compares our intelligent data analysis system with other single baseline models by using multiple evaluation indicators and finds that our system has the smallest MAE, RMSE and SMAPE values on all datasets. Meanwhile, our system can predict the trends of concentration changes well and the predicted values are closer to true values, which prove that it can effectively improve the precision of prediction. Finally, our system is used to predict concentration values of sites in the Great Lakes region in the next 5 years. The predicted concentrations present a large fluctuation trend in each year, but the overall trend is downward.
first_indexed 2024-03-09T22:06:55Z
format Article
id doaj.art-cf48ea3b19dc46bb9386ba726f564a46
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T22:06:55Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-cf48ea3b19dc46bb9386ba726f564a462023-11-23T19:40:49ZengMDPI AGElectronics2079-92922022-02-0111465210.3390/electronics11040652An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration PredictionLu Yu0Chunxue Wu1Neal N. Xiong2School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaSchool of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaDepartment of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USAPersistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent prediction system to accurately predict the future POPs concentrations in advance. This work has extremely important significance for policy formulation, human health, environmental protection and the sustainable development of society. Since the POPs concentrations sequence contains both linear and nonlinear components, this paper proposes an intelligent data analysis system combining autoregressive integrated moving average (ARIMA) and long short-term memory network (LSTM) to analyze and predict the POPs concentrations in the Great Lakes region. ARIMA is used to capture linear components while LSTM is used to process nonlinear components, which overcomes the deficiency of single models. Moreover, a one-class SVM algorithm is used to detect outliers during data preprocessing. Bayesian information criterion and grid search methods are also used to obtain the optimal parameter combinations of ARIMA and LSTM, respectively. This paper compares our intelligent data analysis system with other single baseline models by using multiple evaluation indicators and finds that our system has the smallest MAE, RMSE and SMAPE values on all datasets. Meanwhile, our system can predict the trends of concentration changes well and the predicted values are closer to true values, which prove that it can effectively improve the precision of prediction. Finally, our system is used to predict concentration values of sites in the Great Lakes region in the next 5 years. The predicted concentrations present a large fluctuation trend in each year, but the overall trend is downward.https://www.mdpi.com/2079-9292/11/4/652data analysistime seriesLSTM modelARIMA modelconcentration prediction
spellingShingle Lu Yu
Chunxue Wu
Neal N. Xiong
An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
Electronics
data analysis
time series
LSTM model
ARIMA model
concentration prediction
title An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
title_full An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
title_fullStr An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
title_full_unstemmed An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
title_short An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
title_sort intelligent data analysis system combining arima and lstm for persistent organic pollutants concentration prediction
topic data analysis
time series
LSTM model
ARIMA model
concentration prediction
url https://www.mdpi.com/2079-9292/11/4/652
work_keys_str_mv AT luyu anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction
AT chunxuewu anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction
AT nealnxiong anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction
AT luyu intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction
AT chunxuewu intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction
AT nealnxiong intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction