Machine learning for improved data analysis of biological aerosol using the WIBS
<p>Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentia...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-11-01
|
Series: | Atmospheric Measurement Techniques |
Online Access: | https://www.atmos-meas-tech.net/11/6203/2018/amt-11-6203-2018.pdf |
_version_ | 1818038822573703168 |
---|---|
author | S. Ruske D. O. Topping V. E. Foot A. P. Morse M. W. Gallagher |
author_facet | S. Ruske D. O. Topping V. E. Foot A. P. Morse M. W. Gallagher |
author_sort | S. Ruske |
collection | DOAJ |
description | <p>Primary biological aerosol including bacteria, fungal spores and pollen have
important implications for public health and the environment. Such particles
may have different concentrations of chemical fluorophores and will respond
differently in the presence of ultraviolet light, potentially allowing for
different types of biological aerosol to be discriminated. Development of
ultraviolet light induced fluorescence (UV-LIF) instruments such as the
Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology
and fluorescence measurements to be collected in real-time. However, it is
unclear without studying instrument responses in the laboratory, the extent
to which different types of particles can be discriminated. Collection of
laboratory data is vital to validate any approach used to analyse data and
ensure that the data available is utilized as effectively as possible.</p><p>In this paper a variety of methodologies are tested on a range of
particles collected in the laboratory. Hierarchical agglomerative clustering
(HAC) has been previously applied to UV-LIF data in a number of studies and
is tested alongside other algorithms that could be used to solve the
classification problem: Density Based Spectral Clustering and Noise (DBSCAN),
<i>k</i>-means and gradient boosting.</p><p>Whilst HAC was able to effectively discriminate between reference narrow-size
distribution PSL particles, yielding a classification error of only 1.8 %,
similar results were not obtained when testing on laboratory generated
aerosol where the classification error was found to be between 11.5 % and
24.2 %. Furthermore, there is a large uncertainty in this approach in terms
of the data preparation and the cluster index used, and we were unable to
attain consistent results across the different sets of laboratory generated
aerosol tested.</p><p>The lowest classification errors were obtained using gradient boosting, where
the misclassification rate was between 4.38 % and 5.42 %. The largest
contribution to the error, in the case of the higher misclassification rate,
was the pollen samples where 28.5 % of the samples were incorrectly
classified as fungal spores. The technique was robust to changes in data
preparation provided a fluorescent threshold was applied to the data.</p><p>In the event that laboratory training data are unavailable, DBSCAN was found
to be a potential alternative to HAC. In the case of one of the data sets
where 22.9 % of the data were left unclassified we were able to produce
three distinct clusters obtaining a classification error of only 1.42 % on
the classified data. These results could not be replicated for the other data
set where 26.8 % of the data were not classified and a classification error
of 13.8 % was obtained. This method, like HAC, also appeared to be heavily
dependent on data preparation, requiring a different selection of parameters
depending on the preparation used. Further analysis will also be required to
confirm our selection of the parameters when using this method on ambient
data.</p><p>There is a clear need for the collection of additional laboratory generated
aerosol to improve interpretation of current databases and to aid in the
analysis of data collected from an ambient environment. New instruments with
a greater resolution are likely to improve on current discrimination between
pollen, bacteria and fungal spores and even between different species,
however the need for extensive laboratory data sets will grow as a result.</p> |
first_indexed | 2024-12-10T07:48:51Z |
format | Article |
id | doaj.art-dcd27422b9ad4a498aac59324f897607 |
institution | Directory Open Access Journal |
issn | 1867-1381 1867-8548 |
language | English |
last_indexed | 2024-12-10T07:48:51Z |
publishDate | 2018-11-01 |
publisher | Copernicus Publications |
record_format | Article |
series | Atmospheric Measurement Techniques |
spelling | doaj.art-dcd27422b9ad4a498aac59324f8976072022-12-22T01:57:06ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482018-11-01116203623010.5194/amt-11-6203-2018Machine learning for improved data analysis of biological aerosol using the WIBSS. Ruske0D. O. Topping1V. E. Foot2A. P. Morse3M. W. Gallagher4Centre of Atmospheric Science, SEES, University of Manchester, Manchester, UKCentre of Atmospheric Science, SEES, University of Manchester, Manchester, UKDefence, Science and Technology Laboratory, Porton Down, Salisbury, UKDepartment of Geography and Planning, University of Liverpool, Liverpool, UKCentre of Atmospheric Science, SEES, University of Manchester, Manchester, UK<p>Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentially allowing for different types of biological aerosol to be discriminated. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology and fluorescence measurements to be collected in real-time. However, it is unclear without studying instrument responses in the laboratory, the extent to which different types of particles can be discriminated. Collection of laboratory data is vital to validate any approach used to analyse data and ensure that the data available is utilized as effectively as possible.</p><p>In this paper a variety of methodologies are tested on a range of particles collected in the laboratory. Hierarchical agglomerative clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), <i>k</i>-means and gradient boosting.</p><p>Whilst HAC was able to effectively discriminate between reference narrow-size distribution PSL particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable to attain consistent results across the different sets of laboratory generated aerosol tested.</p><p>The lowest classification errors were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to the error, in the case of the higher misclassification rate, was the pollen samples where 28.5 % of the samples were incorrectly classified as fungal spores. The technique was robust to changes in data preparation provided a fluorescent threshold was applied to the data.</p><p>In the event that laboratory training data are unavailable, DBSCAN was found to be a potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data were left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated for the other data set where 26.8 % of the data were not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring a different selection of parameters depending on the preparation used. Further analysis will also be required to confirm our selection of the parameters when using this method on ambient data.</p><p>There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely to improve on current discrimination between pollen, bacteria and fungal spores and even between different species, however the need for extensive laboratory data sets will grow as a result.</p>https://www.atmos-meas-tech.net/11/6203/2018/amt-11-6203-2018.pdf |
spellingShingle | S. Ruske D. O. Topping V. E. Foot A. P. Morse M. W. Gallagher Machine learning for improved data analysis of biological aerosol using the WIBS Atmospheric Measurement Techniques |
title | Machine learning for improved data analysis of biological aerosol using the WIBS |
title_full | Machine learning for improved data analysis of biological aerosol using the WIBS |
title_fullStr | Machine learning for improved data analysis of biological aerosol using the WIBS |
title_full_unstemmed | Machine learning for improved data analysis of biological aerosol using the WIBS |
title_short | Machine learning for improved data analysis of biological aerosol using the WIBS |
title_sort | machine learning for improved data analysis of biological aerosol using the wibs |
url | https://www.atmos-meas-tech.net/11/6203/2018/amt-11-6203-2018.pdf |
work_keys_str_mv | AT sruske machinelearningforimproveddataanalysisofbiologicalaerosolusingthewibs AT dotopping machinelearningforimproveddataanalysisofbiologicalaerosolusingthewibs AT vefoot machinelearningforimproveddataanalysisofbiologicalaerosolusingthewibs AT apmorse machinelearningforimproveddataanalysisofbiologicalaerosolusingthewibs AT mwgallagher machinelearningforimproveddataanalysisofbiologicalaerosolusingthewibs |