PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection
Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow>&l...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-06-01
|
Series: | Atmosphere |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4433/14/6/968 |
_version_ | 1797596190696538112 |
---|---|
author | Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim |
author_facet | Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim |
author_sort | Baekcheon Kim |
collection | DOAJ |
description | Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient. |
first_indexed | 2024-03-11T02:48:03Z |
format | Article |
id | doaj.art-c4ae2931292545c98eb77c10461b3ff0 |
institution | Directory Open Access Journal |
issn | 2073-4433 |
language | English |
last_indexed | 2024-03-11T02:48:03Z |
publishDate | 2023-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Atmosphere |
spelling | doaj.art-c4ae2931292545c98eb77c10461b3ff02023-11-18T09:14:24ZengMDPI AGAtmosphere2073-44332023-06-0114696810.3390/atmos14060968PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature SelectionBaekcheon Kim0Eunkyeong Kim1Seunghwan Jung2Minseok Kim3Jinyong Kim4Sungshin Kim5Department of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Republic of KoreaParticulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.https://www.mdpi.com/2073-4433/14/6/968PM2.5 concentration forecastingbidirectional long short-term memoryrandom forestweight method |
spellingShingle | Baekcheon Kim Eunkyeong Kim Seunghwan Jung Minseok Kim Jinyong Kim Sungshin Kim PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection Atmosphere PM2.5 concentration forecasting bidirectional long short-term memory random forest weight method |
title | PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection |
title_full | PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection |
title_fullStr | PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection |
title_full_unstemmed | PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection |
title_short | PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection |
title_sort | pm sub 2 5 sub concentration forecasting using weighted bi lstm and random forest feature importance based feature selection |
topic | PM2.5 concentration forecasting bidirectional long short-term memory random forest weight method |
url | https://www.mdpi.com/2073-4433/14/6/968 |
work_keys_str_mv | AT baekcheonkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT eunkyeongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT seunghwanjung pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT minseokkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT jinyongkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection AT sungshinkim pmsub25subconcentrationforecastingusingweightedbilstmandrandomforestfeatureimportancebasedfeatureselection |