Big-But-Biased Data Analytics for Air Quality

Air pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight functio...

Full description

Bibliographic Details
Main Authors: Laura Borrajo, Ricardo Cao
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/9/9/1551
_version_ 1797552992095830016
author Laura Borrajo
Ricardo Cao
author_facet Laura Borrajo
Ricardo Cao
author_sort Laura Borrajo
collection DOAJ
description Air pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. The general parameter considered is the mean of a transformation of the random variable of interest. A new bootstrap algorithm is used to approximate the mean squared error of the new estimator. Its minimization leads to an automatic bandwidth selector. The method is applied to a real data set concerning the levels of different pollutants in the urban air of the city of A Coruña (Galicia, NW Spain). Estimations for the mean and the cumulative distribution function of the level of ozone and nitrogen dioxide when the temperature is greater than or equal to 30 <inline-formula><math display="inline"><semantics><msup><mrow></mrow><mo>∘</mo></msup></semantics></math></inline-formula>C based on 15 years of biased data are obtained.
first_indexed 2024-03-10T16:09:05Z
format Article
id doaj.art-cc65251855d3447290a21254352398e9
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T16:09:05Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-cc65251855d3447290a21254352398e92023-11-20T14:40:42ZengMDPI AGElectronics2079-92922020-09-0199155110.3390/electronics9091551Big-But-Biased Data Analytics for Air QualityLaura Borrajo0Ricardo Cao1Research Group MODES, Department of Mathematics, CITIC, University of A Coruña, 15071 A Coruña, SpainResearch Group MODES, Department of Mathematics, CITIC and ITMATI, University of A Coruña, 15071 A Coruña, SpainAir pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. The general parameter considered is the mean of a transformation of the random variable of interest. A new bootstrap algorithm is used to approximate the mean squared error of the new estimator. Its minimization leads to an automatic bandwidth selector. The method is applied to a real data set concerning the levels of different pollutants in the urban air of the city of A Coruña (Galicia, NW Spain). Estimations for the mean and the cumulative distribution function of the level of ozone and nitrogen dioxide when the temperature is greater than or equal to 30 <inline-formula><math display="inline"><semantics><msup><mrow></mrow><mo>∘</mo></msup></semantics></math></inline-formula>C based on 15 years of biased data are obtained.https://www.mdpi.com/2079-9292/9/9/1551air qualityautomatic bandwidth selectionbig databootstrapkernel density estimationlarge sample size
spellingShingle Laura Borrajo
Ricardo Cao
Big-But-Biased Data Analytics for Air Quality
Electronics
air quality
automatic bandwidth selection
big data
bootstrap
kernel density estimation
large sample size
title Big-But-Biased Data Analytics for Air Quality
title_full Big-But-Biased Data Analytics for Air Quality
title_fullStr Big-But-Biased Data Analytics for Air Quality
title_full_unstemmed Big-But-Biased Data Analytics for Air Quality
title_short Big-But-Biased Data Analytics for Air Quality
title_sort big but biased data analytics for air quality
topic air quality
automatic bandwidth selection
big data
bootstrap
kernel density estimation
large sample size
url https://www.mdpi.com/2079-9292/9/9/1551
work_keys_str_mv AT lauraborrajo bigbutbiaseddataanalyticsforairquality
AT ricardocao bigbutbiaseddataanalyticsforairquality