Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components
Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in pr...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-11-01
|
Series: | Atmosphere |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4433/11/11/1233 |
_version_ | 1797547717560369152 |
---|---|
author | Tianyu Zhang Guannan Geng Yang Liu Howard H. Chang |
author_facet | Tianyu Zhang Guannan Geng Yang Liu Howard H. Chang |
author_sort | Tianyu Zhang |
collection | DOAJ |
description | Bayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM<sub>2.5</sub>) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (<i>R<sup>2</sup></i> from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM<sub>2.5</sub> components could be estimated with good accuracy, especially when collocated PM<sub>2.5</sub> total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses. |
first_indexed | 2024-03-10T14:49:10Z |
format | Article |
id | doaj.art-c3fc7592f2634202ad8ad862dbf1f91a |
institution | Directory Open Access Journal |
issn | 2073-4433 |
language | English |
last_indexed | 2024-03-10T14:49:10Z |
publishDate | 2020-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Atmosphere |
spelling | doaj.art-c3fc7592f2634202ad8ad862dbf1f91a2023-11-20T21:10:49ZengMDPI AGAtmosphere2073-44332020-11-011111123310.3390/atmos11111233Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> ComponentsTianyu Zhang0Guannan Geng1Yang Liu2Howard H. Chang3Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USAState Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, ChinaGangarosa Department of Environmental Health, Emory University, Atlanta, GA 30322, USADepartment of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USABayesian additive regression tree (BART) is a recent statistical method that combines ensemble learning and nonparametric regression. BART is constructed under a probabilistic framework that also allows for model-based prediction uncertainty quantification. We evaluated the application of BART in predicting daily concentrations of four fine particulate matter (PM<sub>2.5</sub>) components (elemental carbon, organic carbon, nitrate, and sulfate) in California during the period 2005 to 2014. We demonstrate in this paper how BART can be tuned to optimize prediction performance and how to evaluate variable importance. Our BART models included, as predictors, a large suite of land-use variables, meteorological conditions, satellite-derived aerosol optical depth parameters, and simulations from a chemical transport model. In cross-validation experiments, BART demonstrated good out-of-sample prediction performance at monitoring locations (<i>R<sup>2</sup></i> from 0.62 to 0.73). More importantly, prediction intervals associated with concentration estimates from BART showed good coverage probability at locations with and without monitoring data. In our case study, major PM<sub>2.5</sub> components could be estimated with good accuracy, especially when collocated PM<sub>2.5</sub> total mass observations were available. In conclusion, BART is an attractive approach for modeling ambient air pollution levels, especially for its ability to provide uncertainty in estimates that may be useful for subsequent health impact and health effect analyses.https://www.mdpi.com/2073-4433/11/11/1233regression treesmachine learningBayesian modelparticulate matterCommunity Multiscale Air Quality (CMAQ)aerosol optical depth |
spellingShingle | Tianyu Zhang Guannan Geng Yang Liu Howard H. Chang Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components Atmosphere regression trees machine learning Bayesian model particulate matter Community Multiscale Air Quality (CMAQ) aerosol optical depth |
title | Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components |
title_full | Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components |
title_fullStr | Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components |
title_full_unstemmed | Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components |
title_short | Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM<sub>2.5</sub> Components |
title_sort | application of bayesian additive regression trees for estimating daily concentrations of pm sub 2 5 sub components |
topic | regression trees machine learning Bayesian model particulate matter Community Multiscale Air Quality (CMAQ) aerosol optical depth |
url | https://www.mdpi.com/2073-4433/11/11/1233 |
work_keys_str_mv | AT tianyuzhang applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents AT guannangeng applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents AT yangliu applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents AT howardhchang applicationofbayesianadditiveregressiontreesforestimatingdailyconcentrationsofpmsub25subcomponents |