A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan

Modeling is a cost-effective measure to estimate ultrafine particle (UFP) levels. Previous UFP estimates generally relied on land-use regression with insufficient temporal resolution. We carried out in-situ measurements for UFP in central Taiwan and developed a model incorporating satellite-based me...

Full description

Bibliographic Details
Main Authors: Chau-Ren Jung, Wei-Ting Chen, Li-Hao Young, Ta-Chih Hsiao
Format: Article
Language:English
Published: Elsevier 2023-05-01
Series:Environment International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0160412023002106
_version_ 1827943918375796736
author Chau-Ren Jung
Wei-Ting Chen
Li-Hao Young
Ta-Chih Hsiao
author_facet Chau-Ren Jung
Wei-Ting Chen
Li-Hao Young
Ta-Chih Hsiao
author_sort Chau-Ren Jung
collection DOAJ
description Modeling is a cost-effective measure to estimate ultrafine particle (UFP) levels. Previous UFP estimates generally relied on land-use regression with insufficient temporal resolution. We carried out in-situ measurements for UFP in central Taiwan and developed a model incorporating satellite-based measurements, meteorological variables, and land-use data to estimate daily UFP levels at a 1-km resolution. Two sampling campaigns were conducted for measuring hourly UFP concentrations at six sites between 2008–2010 and 2017–2021, respectively, using scanning mobility particle sizers. Three machine learning algorithms, namely random forest, eXtreme gradient boosting (XGBoost), and deep neural network, were used to develop UFP estimation models. The performances were evaluated with a 10-fold cross-validation, temporal, and spatial validation. A total of 1,022 effective sampling days were conducted. The XGBoost model had the best performance with a training coefficient of determination (R2) of 0.99 [normalized root mean square error (nRMSE): 6.52%] and a cross-validation R2 of 0.78 (nRMSE: 31.0%). The ten most important variables were surface pressure, distance to the nearest road, temperature, calendar year, day of the year, NO2, meridional wind, the total length of roads, PM2.5, and zonal wind. The UFP levels were elevated along the main roads across different seasons, suggesting that traffic emission is an important contributor to UFP. This hybrid model outperformed prior land use regression models and thus can provide more accurate estimates of UFP for epidemiological studies.
first_indexed 2024-03-13T10:22:34Z
format Article
id doaj.art-2beefbc5e29a4d648240a313f0fcf67d
institution Directory Open Access Journal
issn 0160-4120
language English
last_indexed 2024-03-13T10:22:34Z
publishDate 2023-05-01
publisher Elsevier
record_format Article
series Environment International
spelling doaj.art-2beefbc5e29a4d648240a313f0fcf67d2023-05-20T04:29:13ZengElsevierEnvironment International0160-41202023-05-01175107937A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central TaiwanChau-Ren Jung0Wei-Ting Chen1Li-Hao Young2Ta-Chih Hsiao3Department of Public Health, College of Public Health, China Medical University, Taichung, Taiwan; Japan Environment and Children’s Study Programme Office, Health and Environmental Risk Division, National Institute for Environmental Studies, Tsukuba, Japan; Corresponding author at: Department of Public Health, College of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd., Beitun Dist., Taichung City 406040, Taiwan, ROC.Department of Atmospheric Sciences, National Taiwan University, Taipei, TaiwanDepartment of Occupational Safety and Health, China Medical University, Taichung, TaiwanGraduate Institute of Environmental Engineering, National Taiwan University, Taipei, TaiwanModeling is a cost-effective measure to estimate ultrafine particle (UFP) levels. Previous UFP estimates generally relied on land-use regression with insufficient temporal resolution. We carried out in-situ measurements for UFP in central Taiwan and developed a model incorporating satellite-based measurements, meteorological variables, and land-use data to estimate daily UFP levels at a 1-km resolution. Two sampling campaigns were conducted for measuring hourly UFP concentrations at six sites between 2008–2010 and 2017–2021, respectively, using scanning mobility particle sizers. Three machine learning algorithms, namely random forest, eXtreme gradient boosting (XGBoost), and deep neural network, were used to develop UFP estimation models. The performances were evaluated with a 10-fold cross-validation, temporal, and spatial validation. A total of 1,022 effective sampling days were conducted. The XGBoost model had the best performance with a training coefficient of determination (R2) of 0.99 [normalized root mean square error (nRMSE): 6.52%] and a cross-validation R2 of 0.78 (nRMSE: 31.0%). The ten most important variables were surface pressure, distance to the nearest road, temperature, calendar year, day of the year, NO2, meridional wind, the total length of roads, PM2.5, and zonal wind. The UFP levels were elevated along the main roads across different seasons, suggesting that traffic emission is an important contributor to UFP. This hybrid model outperformed prior land use regression models and thus can provide more accurate estimates of UFP for epidemiological studies.http://www.sciencedirect.com/science/article/pii/S0160412023002106Estimation modelFeature importanceMachine learningMeteorological variablesSatellite-based measurementUltrafine particles
spellingShingle Chau-Ren Jung
Wei-Ting Chen
Li-Hao Young
Ta-Chih Hsiao
A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
Environment International
Estimation model
Feature importance
Machine learning
Meteorological variables
Satellite-based measurement
Ultrafine particles
title A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
title_full A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
title_fullStr A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
title_full_unstemmed A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
title_short A hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central Taiwan
title_sort hybrid model for estimating the number concentration of ultrafine particles based on machine learning algorithms in central taiwan
topic Estimation model
Feature importance
Machine learning
Meteorological variables
Satellite-based measurement
Ultrafine particles
url http://www.sciencedirect.com/science/article/pii/S0160412023002106
work_keys_str_mv AT chaurenjung ahybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT weitingchen ahybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT lihaoyoung ahybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT tachihhsiao ahybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT chaurenjung hybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT weitingchen hybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT lihaoyoung hybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan
AT tachihhsiao hybridmodelforestimatingthenumberconcentrationofultrafineparticlesbasedonmachinelearningalgorithmsincentraltaiwan