PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection

Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow>&l...

Full description

Bibliographic Details
Main Authors:	Baekcheon Kim, Eunkyeong Kim, Seunghwan Jung, Minseok Kim, Jinyong Kim, Sungshin Kim
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Atmosphere
Subjects:	PM2.5 concentration forecasting bidirectional long short-term memory random forest weight method
Online Access:	https://www.mdpi.com/2073-4433/14/6/968

_version_	1797596190696538112
author	Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim
author_facet	Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim
author_sort	Baekcheon Kim
collection	DOAJ
description	Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.
first_indexed	2024-03-11T02:48:03Z
format	Article
id	doaj.art-c4ae2931292545c98eb77c10461b3ff0
institution	Directory Open Access Journal
issn	2073-4433
language	English
last_indexed	2024-03-11T02:48:03Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Atmosphere
spelling	doaj.art-c4ae2931292545c98eb77c10461b3ff02023-11-18T09:14:24ZengMDPI AGAtmosphere2073-44332023-06-0114696810.3390/atmos14060968PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature SelectionBaekcheon Kim0Eunkyeong Kim1Seunghwan Jung2Minseok Kim3Jinyong Kim4Sungshin Kim5Department of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaParticulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.https://www.mdpi.com/2073-4433/14/6/968PM2.5 concentration forecastingbidirectional long short-term memoryrandom forestweight method
spellingShingle	Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection Atmosphere PM2.5 concentration forecasting bidirectional long short-term memory random forest weight method
title	PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_full	PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_fullStr	PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_full_unstemmed	PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_short	PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_sort	pm sub 2 5 sub concentration forecasting using weighted bi lstm and random forest feature importance based feature selection
topic	PM2.5 concentration forecasting bidirectional long short-term memory random forest weight method
url	https://www.mdpi.com/2073-4433/14/6/968
work_keys_str_mv	AT baekcheonkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT eunkyeongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT seunghwanjung pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT minseokkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT jinyongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT sungshinkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection

PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection

Similar Items