Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction

High level of tropospheric ozone concentration, exceeding allowable level has been frequently reported in Malaysia. This study proposes accurate model based on Machine Learning algorithms to predict Tropospheric ozone concentration in major cities located in Kuala Lumpur and Selangor, Malaysia. The...

Full description

Bibliographic Details
Main Authors: Ellysia Jumin, Nuratiah Zaini, Ali Najah Ahmed, Samsuri Abdullah, Marzuki Ismail, Mohsen Sherif, Ahmed Sefelnasr, Ahmed El-Shafie
Format: Article
Language:English
Published: Taylor & Francis Group 2020-01-01
Series:Engineering Applications of Computational Fluid Mechanics
Subjects:
Online Access:http://dx.doi.org/10.1080/19942060.2020.1758792
_version_ 1818920768697794560
author Ellysia Jumin
Nuratiah Zaini
Ali Najah Ahmed
Samsuri Abdullah
Marzuki Ismail
Mohsen Sherif
Ahmed Sefelnasr
Ahmed El-Shafie
author_facet Ellysia Jumin
Nuratiah Zaini
Ali Najah Ahmed
Samsuri Abdullah
Marzuki Ismail
Mohsen Sherif
Ahmed Sefelnasr
Ahmed El-Shafie
author_sort Ellysia Jumin
collection DOAJ
description High level of tropospheric ozone concentration, exceeding allowable level has been frequently reported in Malaysia. This study proposes accurate model based on Machine Learning algorithms to predict Tropospheric ozone concentration in major cities located in Kuala Lumpur and Selangor, Malaysia. The proposed models were developed using three-year of historical data for different parameters as input to predict 24-hour and 12-hour of tropospheric ozone concentration. Different Machine Learning algorithms have been investigated, viz. Linear Regression, Neural Network and Boosted Decision Tree. The results revealed that wind speed, humidity, Nitrogen Oxide, Carbon Monoxide and Nitrogen Dioxide have significant influence on ozone formation. Boosted Decision Tree outperformed Linear regression and Neural Network algorithms for all stations. The performance of the proposed model improved by using 12-hours dataset instead of the 24-hour where R2 values were equal to 0.91, 0.88 and 0.87 for the three investigated stations. To assess the uncertainties of the Boosted Decision Tree model, 95% prediction uncertainties (95PPU) d-factors were introduced.95PPU showed about 94.4, 93.4, 96.7% and the d-factors were 0.001015, 0.001016 and 0.001124 which relate to S1, S2 and S3, respectively. The obtained results provide a reliable prediction model to mimic actual ozone concentration in different locations in Malaysia.
first_indexed 2024-12-20T01:27:00Z
format Article
id doaj.art-06312a6c996d49d8b8b254768ec80021
institution Directory Open Access Journal
issn 1994-2060
1997-003X
language English
last_indexed 2024-12-20T01:27:00Z
publishDate 2020-01-01
publisher Taylor & Francis Group
record_format Article
series Engineering Applications of Computational Fluid Mechanics
spelling doaj.art-06312a6c996d49d8b8b254768ec800212022-12-21T19:58:13ZengTaylor & Francis GroupEngineering Applications of Computational Fluid Mechanics1994-20601997-003X2020-01-0114171372510.1080/19942060.2020.17587921758792Machine learning versus linear regression modelling approach for accurate ozone concentrations predictionEllysia Jumin0Nuratiah Zaini1Ali Najah Ahmed2Samsuri Abdullah3Marzuki Ismail4Mohsen Sherif5Ahmed Sefelnasr6Ahmed El-Shafie7Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN)Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN)Institute for Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN)Air Quality and Environment Research Group, Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia TerengganuFaculty of Science and Marine Environment, Universiti Malaysia TerengganuNational Water Center (NWC), United Arab Emirates UniversityNational Water Center (NWC), United Arab Emirates UniversityNational Water Center (NWC), United Arab Emirates UniversityHigh level of tropospheric ozone concentration, exceeding allowable level has been frequently reported in Malaysia. This study proposes accurate model based on Machine Learning algorithms to predict Tropospheric ozone concentration in major cities located in Kuala Lumpur and Selangor, Malaysia. The proposed models were developed using three-year of historical data for different parameters as input to predict 24-hour and 12-hour of tropospheric ozone concentration. Different Machine Learning algorithms have been investigated, viz. Linear Regression, Neural Network and Boosted Decision Tree. The results revealed that wind speed, humidity, Nitrogen Oxide, Carbon Monoxide and Nitrogen Dioxide have significant influence on ozone formation. Boosted Decision Tree outperformed Linear regression and Neural Network algorithms for all stations. The performance of the proposed model improved by using 12-hours dataset instead of the 24-hour where R2 values were equal to 0.91, 0.88 and 0.87 for the three investigated stations. To assess the uncertainties of the Boosted Decision Tree model, 95% prediction uncertainties (95PPU) d-factors were introduced.95PPU showed about 94.4, 93.4, 96.7% and the d-factors were 0.001015, 0.001016 and 0.001124 which relate to S1, S2 and S3, respectively. The obtained results provide a reliable prediction model to mimic actual ozone concentration in different locations in Malaysia.http://dx.doi.org/10.1080/19942060.2020.1758792ozone concentration predictionmachine learning algorithmozone precursorsboosted decision tree regressionneural networklinear regressionpearson correlation
spellingShingle Ellysia Jumin
Nuratiah Zaini
Ali Najah Ahmed
Samsuri Abdullah
Marzuki Ismail
Mohsen Sherif
Ahmed Sefelnasr
Ahmed El-Shafie
Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
Engineering Applications of Computational Fluid Mechanics
ozone concentration prediction
machine learning algorithm
ozone precursors
boosted decision tree regression
neural network
linear regression
pearson correlation
title Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
title_full Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
title_fullStr Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
title_full_unstemmed Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
title_short Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
title_sort machine learning versus linear regression modelling approach for accurate ozone concentrations prediction
topic ozone concentration prediction
machine learning algorithm
ozone precursors
boosted decision tree regression
neural network
linear regression
pearson correlation
url http://dx.doi.org/10.1080/19942060.2020.1758792
work_keys_str_mv AT ellysiajumin machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT nuratiahzaini machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT alinajahahmed machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT samsuriabdullah machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT marzukiismail machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT mohsensherif machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT ahmedsefelnasr machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction
AT ahmedelshafie machinelearningversuslinearregressionmodellingapproachforaccurateozoneconcentrationsprediction