Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data

Air pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air...

Full description

Bibliographic Details
Main Authors: S. Saminathan, C. Malathy
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-06-01
Series:Frontiers in Big Data
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/full
_version_ 1797808366631780352
author S. Saminathan
C. Malathy
author_facet S. Saminathan
C. Malathy
author_sort S. Saminathan
collection DOAJ
description Air pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air quality data are kept mostly for public use. Using the previously calculated AQI values, the future values of AQI can be predicted, or the class/category value of the numeric value can be obtained. This forecast can be performed with more accuracy using supervised machine learning methods. In this study, multiple machine-learning approaches were used to classify PM2.5 values. The values for the pollutant PM2.5 were classified into different groups using machine learning algorithms such as logistic regression, support vector machines, random forest, extreme gradient boosting, and their grid search equivalents, along with the deep learning method multilayer perceptron. After performing multiclass classification using these algorithms, the parameters accuracy and per-class accuracy were used to compare the methods. As the dataset used was imbalanced, a SMOTE-based approach for balancing the dataset was used. Compared to all other classifiers that use the original dataset, the accuracy of the random forest multiclass classifier with SMOTE-based dataset balancing was found to provide better accuracy.
first_indexed 2024-03-13T06:36:29Z
format Article
id doaj.art-dcb60faeeda44fb2b7e048ec3bc56f72
institution Directory Open Access Journal
issn 2624-909X
language English
last_indexed 2024-03-13T06:36:29Z
publishDate 2023-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Big Data
spelling doaj.art-dcb60faeeda44fb2b7e048ec3bc56f722023-06-09T05:01:23ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2023-06-01610.3389/fdata.2023.11752591175259Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological dataS. Saminathan0C. Malathy1Department of Computing Technologies, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, IndiaDepartment of Networking and Communications, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, IndiaAir pollution is a serious challenge to humankind as it poses many health threats. It can be measured using the air quality index (AQI). Air pollution is the result of contamination of both outdoor and indoor environments. The AQI is being monitored by various institutions globally. The measured air quality data are kept mostly for public use. Using the previously calculated AQI values, the future values of AQI can be predicted, or the class/category value of the numeric value can be obtained. This forecast can be performed with more accuracy using supervised machine learning methods. In this study, multiple machine-learning approaches were used to classify PM2.5 values. The values for the pollutant PM2.5 were classified into different groups using machine learning algorithms such as logistic regression, support vector machines, random forest, extreme gradient boosting, and their grid search equivalents, along with the deep learning method multilayer perceptron. After performing multiclass classification using these algorithms, the parameters accuracy and per-class accuracy were used to compare the methods. As the dataset used was imbalanced, a SMOTE-based approach for balancing the dataset was used. Compared to all other classifiers that use the original dataset, the accuracy of the random forest multiclass classifier with SMOTE-based dataset balancing was found to provide better accuracy.https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/fullair quality forecastsupervised machine learningmulticlass classificationimbalanced data setSMOTE
spellingShingle S. Saminathan
C. Malathy
Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
Frontiers in Big Data
air quality forecast
supervised machine learning
multiclass classification
imbalanced data set
SMOTE
title Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
title_full Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
title_fullStr Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
title_full_unstemmed Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
title_short Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data
title_sort ensemble based classification approach for pm2 5 concentration forecasting using meteorological data
topic air quality forecast
supervised machine learning
multiclass classification
imbalanced data set
SMOTE
url https://www.frontiersin.org/articles/10.3389/fdata.2023.1175259/full
work_keys_str_mv AT ssaminathan ensemblebasedclassificationapproachforpm25concentrationforecastingusingmeteorologicaldata
AT cmalathy ensemblebasedclassificationapproachforpm25concentrationforecastingusingmeteorologicaldata