Machine learning for improved data analysis of biological aerosol using the WIBS
<p>Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentia...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-11-01
|
Series: | Atmospheric Measurement Techniques |
Online Access: | https://www.atmos-meas-tech.net/11/6203/2018/amt-11-6203-2018.pdf |
Summary: | <p>Primary biological aerosol including bacteria, fungal spores and pollen have
important implications for public health and the environment. Such particles
may have different concentrations of chemical fluorophores and will respond
differently in the presence of ultraviolet light, potentially allowing for
different types of biological aerosol to be discriminated. Development of
ultraviolet light induced fluorescence (UV-LIF) instruments such as the
Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology
and fluorescence measurements to be collected in real-time. However, it is
unclear without studying instrument responses in the laboratory, the extent
to which different types of particles can be discriminated. Collection of
laboratory data is vital to validate any approach used to analyse data and
ensure that the data available is utilized as effectively as possible.</p><p>In this paper a variety of methodologies are tested on a range of
particles collected in the laboratory. Hierarchical agglomerative clustering
(HAC) has been previously applied to UV-LIF data in a number of studies and
is tested alongside other algorithms that could be used to solve the
classification problem: Density Based Spectral Clustering and Noise (DBSCAN),
<i>k</i>-means and gradient boosting.</p><p>Whilst HAC was able to effectively discriminate between reference narrow-size
distribution PSL particles, yielding a classification error of only 1.8 %,
similar results were not obtained when testing on laboratory generated
aerosol where the classification error was found to be between 11.5 % and
24.2 %. Furthermore, there is a large uncertainty in this approach in terms
of the data preparation and the cluster index used, and we were unable to
attain consistent results across the different sets of laboratory generated
aerosol tested.</p><p>The lowest classification errors were obtained using gradient boosting, where
the misclassification rate was between 4.38 % and 5.42 %. The largest
contribution to the error, in the case of the higher misclassification rate,
was the pollen samples where 28.5 % of the samples were incorrectly
classified as fungal spores. The technique was robust to changes in data
preparation provided a fluorescent threshold was applied to the data.</p><p>In the event that laboratory training data are unavailable, DBSCAN was found
to be a potential alternative to HAC. In the case of one of the data sets
where 22.9 % of the data were left unclassified we were able to produce
three distinct clusters obtaining a classification error of only 1.42 % on
the classified data. These results could not be replicated for the other data
set where 26.8 % of the data were not classified and a classification error
of 13.8 % was obtained. This method, like HAC, also appeared to be heavily
dependent on data preparation, requiring a different selection of parameters
depending on the preparation used. Further analysis will also be required to
confirm our selection of the parameters when using this method on ambient
data.</p><p>There is a clear need for the collection of additional laboratory generated
aerosol to improve interpretation of current databases and to aid in the
analysis of data collected from an ambient environment. New instruments with
a greater resolution are likely to improve on current discrimination between
pollen, bacteria and fungal spores and even between different species,
however the need for extensive laboratory data sets will grow as a result.</p> |
---|---|
ISSN: | 1867-1381 1867-8548 |