Big-But-Biased Data Analytics for Air Quality
Air pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight functio...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/9/9/1551 |
_version_ | 1797552992095830016 |
---|---|
author | Laura Borrajo Ricardo Cao |
author_facet | Laura Borrajo Ricardo Cao |
author_sort | Laura Borrajo |
collection | DOAJ |
description | Air pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. The general parameter considered is the mean of a transformation of the random variable of interest. A new bootstrap algorithm is used to approximate the mean squared error of the new estimator. Its minimization leads to an automatic bandwidth selector. The method is applied to a real data set concerning the levels of different pollutants in the urban air of the city of A Coruña (Galicia, NW Spain). Estimations for the mean and the cumulative distribution function of the level of ozone and nitrogen dioxide when the temperature is greater than or equal to 30 <inline-formula><math display="inline"><semantics><msup><mrow></mrow><mo>∘</mo></msup></semantics></math></inline-formula>C based on 15 years of biased data are obtained. |
first_indexed | 2024-03-10T16:09:05Z |
format | Article |
id | doaj.art-cc65251855d3447290a21254352398e9 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T16:09:05Z |
publishDate | 2020-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-cc65251855d3447290a21254352398e92023-11-20T14:40:42ZengMDPI AGElectronics2079-92922020-09-0199155110.3390/electronics9091551Big-But-Biased Data Analytics for Air QualityLaura Borrajo0Ricardo Cao1Research Group MODES, Department of Mathematics, CITIC, University of A Coruña, 15071 A Coruña, SpainResearch Group MODES, Department of Mathematics, CITIC and ITMATI, University of A Coruña, 15071 A Coruña, SpainAir pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. The general parameter considered is the mean of a transformation of the random variable of interest. A new bootstrap algorithm is used to approximate the mean squared error of the new estimator. Its minimization leads to an automatic bandwidth selector. The method is applied to a real data set concerning the levels of different pollutants in the urban air of the city of A Coruña (Galicia, NW Spain). Estimations for the mean and the cumulative distribution function of the level of ozone and nitrogen dioxide when the temperature is greater than or equal to 30 <inline-formula><math display="inline"><semantics><msup><mrow></mrow><mo>∘</mo></msup></semantics></math></inline-formula>C based on 15 years of biased data are obtained.https://www.mdpi.com/2079-9292/9/9/1551air qualityautomatic bandwidth selectionbig databootstrapkernel density estimationlarge sample size |
spellingShingle | Laura Borrajo Ricardo Cao Big-But-Biased Data Analytics for Air Quality Electronics air quality automatic bandwidth selection big data bootstrap kernel density estimation large sample size |
title | Big-But-Biased Data Analytics for Air Quality |
title_full | Big-But-Biased Data Analytics for Air Quality |
title_fullStr | Big-But-Biased Data Analytics for Air Quality |
title_full_unstemmed | Big-But-Biased Data Analytics for Air Quality |
title_short | Big-But-Biased Data Analytics for Air Quality |
title_sort | big but biased data analytics for air quality |
topic | air quality automatic bandwidth selection big data bootstrap kernel density estimation large sample size |
url | https://www.mdpi.com/2079-9292/9/9/1551 |
work_keys_str_mv | AT lauraborrajo bigbutbiaseddataanalyticsforairquality AT ricardocao bigbutbiaseddataanalyticsforairquality |