An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction
Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent predicti...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/4/652 |
_version_ | 1797480907925356544 |
---|---|
author | Lu Yu Chunxue Wu Neal N. Xiong |
author_facet | Lu Yu Chunxue Wu Neal N. Xiong |
author_sort | Lu Yu |
collection | DOAJ |
description | Persistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent prediction system to accurately predict the future POPs concentrations in advance. This work has extremely important significance for policy formulation, human health, environmental protection and the sustainable development of society. Since the POPs concentrations sequence contains both linear and nonlinear components, this paper proposes an intelligent data analysis system combining autoregressive integrated moving average (ARIMA) and long short-term memory network (LSTM) to analyze and predict the POPs concentrations in the Great Lakes region. ARIMA is used to capture linear components while LSTM is used to process nonlinear components, which overcomes the deficiency of single models. Moreover, a one-class SVM algorithm is used to detect outliers during data preprocessing. Bayesian information criterion and grid search methods are also used to obtain the optimal parameter combinations of ARIMA and LSTM, respectively. This paper compares our intelligent data analysis system with other single baseline models by using multiple evaluation indicators and finds that our system has the smallest MAE, RMSE and SMAPE values on all datasets. Meanwhile, our system can predict the trends of concentration changes well and the predicted values are closer to true values, which prove that it can effectively improve the precision of prediction. Finally, our system is used to predict concentration values of sites in the Great Lakes region in the next 5 years. The predicted concentrations present a large fluctuation trend in each year, but the overall trend is downward. |
first_indexed | 2024-03-09T22:06:55Z |
format | Article |
id | doaj.art-cf48ea3b19dc46bb9386ba726f564a46 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-09T22:06:55Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-cf48ea3b19dc46bb9386ba726f564a462023-11-23T19:40:49ZengMDPI AGElectronics2079-92922022-02-0111465210.3390/electronics11040652An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration PredictionLu Yu0Chunxue Wu1Neal N. Xiong2School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaSchool of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaDepartment of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USAPersistent Organic Pollutants (POPs) are toxic and difficult to degrade, which will cause huge damages to human life and the ecological environment. Therefore, based on historical measurements, it is important to use intelligent methods and data analysis technologies to build an intelligent prediction system to accurately predict the future POPs concentrations in advance. This work has extremely important significance for policy formulation, human health, environmental protection and the sustainable development of society. Since the POPs concentrations sequence contains both linear and nonlinear components, this paper proposes an intelligent data analysis system combining autoregressive integrated moving average (ARIMA) and long short-term memory network (LSTM) to analyze and predict the POPs concentrations in the Great Lakes region. ARIMA is used to capture linear components while LSTM is used to process nonlinear components, which overcomes the deficiency of single models. Moreover, a one-class SVM algorithm is used to detect outliers during data preprocessing. Bayesian information criterion and grid search methods are also used to obtain the optimal parameter combinations of ARIMA and LSTM, respectively. This paper compares our intelligent data analysis system with other single baseline models by using multiple evaluation indicators and finds that our system has the smallest MAE, RMSE and SMAPE values on all datasets. Meanwhile, our system can predict the trends of concentration changes well and the predicted values are closer to true values, which prove that it can effectively improve the precision of prediction. Finally, our system is used to predict concentration values of sites in the Great Lakes region in the next 5 years. The predicted concentrations present a large fluctuation trend in each year, but the overall trend is downward.https://www.mdpi.com/2079-9292/11/4/652data analysistime seriesLSTM modelARIMA modelconcentration prediction |
spellingShingle | Lu Yu Chunxue Wu Neal N. Xiong An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction Electronics data analysis time series LSTM model ARIMA model concentration prediction |
title | An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction |
title_full | An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction |
title_fullStr | An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction |
title_full_unstemmed | An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction |
title_short | An Intelligent Data Analysis System Combining ARIMA and LSTM for Persistent Organic Pollutants Concentration Prediction |
title_sort | intelligent data analysis system combining arima and lstm for persistent organic pollutants concentration prediction |
topic | data analysis time series LSTM model ARIMA model concentration prediction |
url | https://www.mdpi.com/2079-9292/11/4/652 |
work_keys_str_mv | AT luyu anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction AT chunxuewu anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction AT nealnxiong anintelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction AT luyu intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction AT chunxuewu intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction AT nealnxiong intelligentdataanalysissystemcombiningarimaandlstmforpersistentorganicpollutantsconcentrationprediction |