Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations

As most air quality monitoring sites are in urban areas worldwide, machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants. The bias stems from the issue of dataset shift, as the density distributions of predictor va...

Full description

Bibliographic Details
Main Authors: Tan Mi, Die Tang, Jianbo Fu, Wen Zeng, Michael L. Grieneisen, Zihang Zhou, Fengju Jia, Fumo Yang, Yu Zhan
Format: Article
Language:English
Published: Elsevier 2024-01-01
Series:Geoscience Frontiers
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1674987123001536
_version_ 1797390549641068544
author Tan Mi
Die Tang
Jianbo Fu
Wen Zeng
Michael L. Grieneisen
Zihang Zhou
Fengju Jia
Fumo Yang
Yu Zhan
author_facet Tan Mi
Die Tang
Jianbo Fu
Wen Zeng
Michael L. Grieneisen
Zihang Zhou
Fengju Jia
Fumo Yang
Yu Zhan
author_sort Tan Mi
collection DOAJ
description As most air quality monitoring sites are in urban areas worldwide, machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants. The bias stems from the issue of dataset shift, as the density distributions of predictor variables differ greatly between urban and rural areas. We propose a data-augmentation approach based on the multiple imputation by chained equations (MICE-DA) to remedy the dataset shift problem. Compared with the benchmark models, MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity (Chengdu) at the foot of the Tibetan Plateau, especially for correcting the estimation bias, with the mean bias decreasing from –3.4 µg/m3 to –1.6 µg/m3. As a complement to the holdout validation, the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area. The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth (AOD) during the data augmentation. Consequently, the importance of AOD is largely enhanced for predicting PM2.5, and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5% to 18.4%. This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies. The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research.
first_indexed 2024-03-08T23:12:30Z
format Article
id doaj.art-62f879f62a0a49c796444887981d46eb
institution Directory Open Access Journal
issn 1674-9871
language English
last_indexed 2024-03-08T23:12:30Z
publishDate 2024-01-01
publisher Elsevier
record_format Article
series Geoscience Frontiers
spelling doaj.art-62f879f62a0a49c796444887981d46eb2023-12-15T07:23:08ZengElsevierGeoscience Frontiers1674-98712024-01-01151101686Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observationsTan Mi0Die Tang1Jianbo Fu2Wen Zeng3Michael L. Grieneisen4Zihang Zhou5Fengju Jia6Fumo Yang7Yu Zhan8College of Carbon Neutrality Future Technology, Sichuan University, Chengdu, Sichuan 610065, China; Department of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan 610065, ChinaDepartment of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan 610065, ChinaDepartment of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan 610065, ChinaInstitute for Disaster Management and Reconstruction, Sichuan University, Chengdu, Sichuan 610200, ChinaDepartment of Land, Air, and Water Resources, University of California, Davis, CA 95616, United StatesChengdu Academy of Environmental Sciences, Chengdu, Sichuan 610072, ChinaSichuan Chengdu Ecological and Environment Monitoring Center, Chengdu, Sichuan 610011, ChinaCollege of Carbon Neutrality Future Technology, Sichuan University, Chengdu, Sichuan 610065, ChinaCollege of Carbon Neutrality Future Technology, Sichuan University, Chengdu, Sichuan 610065, China; Department of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan 610065, China; Corresponding author at: College of Carbon Neutrality Future Technology, Sichuan University, Chengdu, Sichuan 610065, China.As most air quality monitoring sites are in urban areas worldwide, machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants. The bias stems from the issue of dataset shift, as the density distributions of predictor variables differ greatly between urban and rural areas. We propose a data-augmentation approach based on the multiple imputation by chained equations (MICE-DA) to remedy the dataset shift problem. Compared with the benchmark models, MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity (Chengdu) at the foot of the Tibetan Plateau, especially for correcting the estimation bias, with the mean bias decreasing from –3.4 µg/m3 to –1.6 µg/m3. As a complement to the holdout validation, the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area. The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth (AOD) during the data augmentation. Consequently, the importance of AOD is largely enhanced for predicting PM2.5, and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5% to 18.4%. This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies. The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research.http://www.sciencedirect.com/science/article/pii/S1674987123001536Aerosol optical depthDataset shiftSpatiotemporal DistributionAir quality monitoringMultiple imputation by chained equations
spellingShingle Tan Mi
Die Tang
Jianbo Fu
Wen Zeng
Michael L. Grieneisen
Zihang Zhou
Fengju Jia
Fumo Yang
Yu Zhan
Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
Geoscience Frontiers
Aerosol optical depth
Dataset shift
Spatiotemporal Distribution
Air quality monitoring
Multiple imputation by chained equations
title Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
title_full Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
title_fullStr Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
title_full_unstemmed Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
title_short Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations
title_sort data augmentation for bias correction in mapping pm2 5 based on satellite retrievals and ground observations
topic Aerosol optical depth
Dataset shift
Spatiotemporal Distribution
Air quality monitoring
Multiple imputation by chained equations
url http://www.sciencedirect.com/science/article/pii/S1674987123001536
work_keys_str_mv AT tanmi dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT dietang dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT jianbofu dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT wenzeng dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT michaellgrieneisen dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT zihangzhou dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT fengjujia dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT fumoyang dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations
AT yuzhan dataaugmentationforbiascorrectioninmappingpm25basedonsatelliteretrievalsandgroundobservations