Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
Air pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-06-01
|
Series: | Frontiers in Big Data |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/full |
_version_ | 1797808366631780352 |
---|---|
author | S. Saminathan C. Malathy |
author_facet | S. Saminathan C. Malathy |
author_sort | S. Saminathan |
collection | DOAJ |
description | Air pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air quality data are kept mostly for public use. Using the previously calculated AQI values, the future values of AQI can be predicted, or the class/category value of the numeric value can be obtained. This forecast can be performed with more accuracy using supervised machine learning methods. In this study, multiple machine-learning approaches were used to classify PM2.5 values. The values for the pollutant PM2.5 were classified into different groups using machine learning algorithms such as logistic regression, support vector machines, random forest, extreme gradient boosting, and their grid search equivalents, along with the deep learning method multilayer perceptron. After performing multiclass classification using these algorithms, the parameters accuracy and per-class accuracy were used to compare the methods. As the dataset used was imbalanced, a SMOTE-based approach for balancing the dataset was used. Compared to all other classifiers that use the original dataset, the accuracy of the random forest multiclass classifier with SMOTE-based dataset balancing was found to provide better accuracy. |
first_indexed | 2024-03-13T06:36:29Z |
format | Article |
id | doaj.art-dcb60faeeda44fb2b7e048ec3bc56f72 |
institution | Directory Open Access Journal |
issn | 2624-909X |
language | English |
last_indexed | 2024-03-13T06:36:29Z |
publishDate | 2023-06-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Big Data |
spelling | doaj.art-dcb60faeeda44fb2b7e048ec3bc56f722023-06-09T05:01:23ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2023-06-01610.3389/fdata.2023.11752591175259Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological dataS. Saminathan0C. Malathy1Department of Computing Technologies, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, IndiaDepartment of Networking and Communications, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, IndiaAir pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air quality data are kept mostly for public use. Using the previously calculated AQI values, the future values of AQI can be predicted, or the class/category value of the numeric value can be obtained. This forecast can be performed with more accuracy using supervised machine learning methods. In this study, multiple machine-learning approaches were used to classify PM2.5 values. The values for the pollutant PM2.5 were classified into different groups using machine learning algorithms such as logistic regression, support vector machines, random forest, extreme gradient boosting, and their grid search equivalents, along with the deep learning method multilayer perceptron. After performing multiclass classification using these algorithms, the parameters accuracy and per-class accuracy were used to compare the methods. As the dataset used was imbalanced, a SMOTE-based approach for balancing the dataset was used. Compared to all other classifiers that use the original dataset, the accuracy of the random forest multiclass classifier with SMOTE-based dataset balancing was found to provide better accuracy.https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/fullair quality forecastsupervised machine learningmulticlass classificationimbalanced data setSMOTE |
spellingShingle | S. Saminathan C. Malathy Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data Frontiers in Big Data air quality forecast supervised machine learning multiclass classification imbalanced data set SMOTE |
title | Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data |
title_full | Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data |
title_fullStr | Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data |
title_full_unstemmed | Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data |
title_short | Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data |
title_sort | ensemble based classification approach for pm2 5 concentration forecasting using meteorological data |
topic | air quality forecast supervised machine learning multiclass classification imbalanced data set SMOTE |
url | https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/full |
work_keys_str_mv | AT ssaminathan ensemblebasedclassificationapproachforpm25concentrationforecastingusingmeteorologicaldata AT cmalathy ensemblebasedclassificationapproachforpm25concentrationforecastingusingmeteorologicaldata |