PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection

Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow>&l...

Full description

Bibliographic Details
Main Authors: Baekcheon Kim, Eunkyeong Kim, Seunghwan Jung, Minseok Kim, Jinyong Kim, Sungshin Kim
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/14/6/968
_version_ 1797596190696538112
author Baekcheon Kim
Eunkyeong Kim
Seunghwan Jung
Minseok Kim
Jinyong Kim
Sungshin Kim
author_facet Baekcheon Kim
Eunkyeong Kim
Seunghwan Jung
Minseok Kim
Jinyong Kim
Sungshin Kim
author_sort Baekcheon Kim
collection DOAJ
description Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.
first_indexed 2024-03-11T02:48:03Z
format Article
id doaj.art-c4ae2931292545c98eb77c10461b3ff0
institution Directory Open Access Journal
issn 2073-4433
language English
last_indexed 2024-03-11T02:48:03Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj.art-c4ae2931292545c98eb77c10461b3ff02023-11-18T09:14:24ZengMDPI AGAtmosphere2073-44332023-06-0114696810.3390/atmos14060968PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature SelectionBaekcheon Kim0Eunkyeong Kim1Seunghwan Jung2Minseok Kim3Jinyong Kim4Sungshin Kim5Department of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaParticulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.https://www.mdpi.com/2073-4433/14/6/968PM2.5 concentration forecastingbidirectional long short-term memoryrandom forestweight method
spellingShingle Baekcheon Kim
Eunkyeong Kim
Seunghwan Jung
Minseok Kim
Jinyong Kim
Sungshin Kim
PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
Atmosphere
PM2.5 concentration forecasting
bidirectional long short-term memory
random forest
weight method
title PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_full PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_fullStr PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_full_unstemmed PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_short PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
title_sort pm sub 2 5 sub concentration forecasting using weighted bi lstm and random forest feature importance based feature selection
topic PM2.5 concentration forecasting
bidirectional long short-term memory
random forest
weight method
url https://www.mdpi.com/2073-4433/14/6/968
work_keys_str_mv AT baekcheonkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection
AT eunkyeongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection
AT seunghwanjung pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection
AT minseokkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection
AT jinyongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection
AT sungshinkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection