Comprehensive comparison of various machine learning algorithms for short-term ozone concentration prediction

Ozone (O3) is one of the common air pollutants. An increase in the ozone concentration can adversely affect public health and the environment such as vegetation and crops. Therefore, atmospheric air quality monitoring systems were found to monitor and predict ozone concentration. Due to complex form...

Full description

Bibliographic Details
Main Authors: Ayman Yafouz, Nouar AlDahoul, Ahmed H. Birima, Ali Najah Ahmed, Mohsen Sherif, Ahmed Sefelnasr, Mohammed Falah Allawi, Ahmed Elshafie
Format: Article
Language:English
Published: Elsevier 2022-06-01
Series:Alexandria Engineering Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110016821006918
Description
Summary:Ozone (O3) is one of the common air pollutants. An increase in the ozone concentration can adversely affect public health and the environment such as vegetation and crops. Therefore, atmospheric air quality monitoring systems were found to monitor and predict ozone concentration. Due to complex formation of ozone influenced by precursors of ozone (O3) and meteorological conditions, there is a need to examine and evaluate various machine learning (ML) models for ozone concentration prediction. This study aims to utilize various ML models including Linear Regression (LR), Tree Regression (TR), Support Vector Regression (SVR), Ensemble Regression (ER), Gaussian Process Regression (GPR) and Artificial Neural Networks Models (ANN) to predict tropospheric (O3) using ozone concentration dataset. The dataset was created by observing hourly average data from air quality monitoring systems in 3 different stations including Putrajaya, Kelang, and KL in 3 sites in Peninsular Malaysia. The prediction models have been trained on this dataset and validated by optimizing their hyperparameters. Additionally, the performance of models was evaluated in terms of RMSE, MAE, R2, and training time. The results indicated that LR, SVR, GPR and ANN were able to give the highest R2 (83 % and 89 %) with specific hyperparameters in stations Kelang and KL, respectively. On the other hand, SVR and ER outweigh other models in terms of R2 (79 %) in Putrajaya station. Overall, regardless slightly performance differences, several developed models were able to learn patterns well and provide good prediction performance in terms of R2, RMSE and MAE. Ensemble regression models were found to balance between high prediction accuracy in terms of R2 and low training time and thus considered as a feasible solution for application of Ozone concentration prediction using the data in hourly scenario.
ISSN:1110-0168