Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2017-03-01
|
Series: | Atmospheric Measurement Techniques |
Online Access: | http://www.atmos-meas-tech.net/10/695/2017/amt-10-695-2017.pdf |
Summary: | Characterisation of bioaerosols has important implications within
environment and public health sectors. Recent developments in ultraviolet
light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated
Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter
Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of
fluorescence, size and morphology measurements for the purpose of
discriminating between bacteria, fungal spores and pollen.<br><br>This new generation of instruments has enabled ever larger data sets to be
compiled with the aim of studying more complex environments. In real world
data sets, particularly those from an urban environment, the population may
be dominated by non-biological fluorescent interferents, bringing into
question the accuracy of measurements of quantities such as concentrations.
It is therefore imperative that we validate the performance of different
algorithms which can be used for the task of classification.<br><br>For unsupervised learning we tested hierarchical agglomerative clustering
with various different linkages. For supervised learning, 11 methods were
tested, including decision trees, ensemble methods (random forests, gradient
boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian,
quadratic and linear discriminant analysis, the <i>k</i>-nearest neighbours
algorithm and artificial neural networks).<br><br>The methods were applied to two different data sets produced using the new
MBS, which provides multichannel UV-LIF
fluorescence signatures for single airborne biological particles. The first
data set contained mixed PSLs and the second contained a variety of
laboratory-generated aerosol.<br><br>Clustering in general performs slightly worse than the supervised learning
methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the
two data sets respectively. For supervised learning the gradient
boosting
algorithm was found to be the most effective, on average correctly
classifying 82. 8 and 98. 27 % of the testing data, respectively, across
the two data sets.<br><br>A possible alternative to gradient boosting is neural networks. We do however
note that this method requires much more user input than the other methods,
and we suggest that further research should be conducted using this method,
especially using parallelised hardware such as the GPU, which would allow for
larger networks to be trained, which could possibly yield better results.<br><br>We also saw that some methods, such as clustering, failed to utilise the
additional shape information provided by the instrument, whilst for others,
such as the decision trees, ensemble methods and neural networks,
improved performance could be attained with the inclusion of such
information. |
---|---|
ISSN: | 1867-1381 1867-8548 |