Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components

Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in pr...

Full description

Bibliographic Details
Main Authors: Tianyu Zhang, Guannan Geng, Yang Liu, Howard H. Chang
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/11/11/1233
_version_ 1797547717560369152
author Tianyu Zhang
Guannan Geng
Yang Liu
Howard H. Chang
author_facet Tianyu Zhang
Guannan Geng
Yang Liu
Howard H. Chang
author_sort Tianyu Zhang
collection DOAJ
description Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM<sub>2.5</sub>) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (<i>R<sup>2</sup></i> from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM<sub>2.5</sub> components could be estimated with good accuracy, especially when collocated PM<sub>2.5</sub> total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.
first_indexed 2024-03-10T14:49:10Z
format Article
id doaj.art-c3fc7592f2634202ad8ad862dbf1f91a
institution Directory Open Access Journal
issn 2073-4433
language English
last_indexed 2024-03-10T14:49:10Z
publishDate 2020-11-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj.art-c3fc7592f2634202ad8ad862dbf1f91a2023-11-20T21:10:49ZengMDPI AGAtmosphere2073-44332020-11-011111123310.3390/atmos11111233Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> ComponentsTianyu Zhang0Guannan Geng1Yang Liu2Howard H. Chang3Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USAState Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, ChinaGangarosa Department of Environmental Health, Emory University, Atlanta, GA 30322, USADepartment of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USABayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM<sub>2.5</sub>) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (<i>R<sup>2</sup></i> from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM<sub>2.5</sub> components could be estimated with good accuracy, especially when collocated PM<sub>2.5</sub> total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.https://www.mdpi.com/2073-4433/11/11/1233regression treesmachine learningBayesian modelparticulate matterCommunity Multiscale Air Quality (CMAQ)aerosol optical depth
spellingShingle Tianyu Zhang
Guannan Geng
Yang Liu
Howard H. Chang
Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
Atmosphere
regression trees
machine learning
Bayesian model
particulate matter
Community Multiscale Air Quality (CMAQ)
aerosol optical depth
title Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
title_full Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
title_fullStr Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
title_full_unstemmed Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
title_short Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
title_sort application of bayesian additive regression trees for estimating daily concentrations of pm sub 2 5 sub components
topic regression trees
machine learning
Bayesian model
particulate matter
Community Multiscale Air Quality (CMAQ)
aerosol optical depth
url https://www.mdpi.com/2073-4433/11/11/1233
work_keys_str_mv AT tianyuzhang applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents
AT guannangeng applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents
AT yangliu applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents
AT howardhchang applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents