Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol...

Full description

Bibliographic Details
Main Authors: S. Ruske, D. O. Topping, V. E. Foot, P. H. Kaye, W. R. Stanley, I. Crawford, A. P. Morse, M. W. Gallagher
Format: Article
Language:English
Published: Copernicus Publications 2017-03-01
Series:Atmospheric Measurement Techniques
Online Access:http://www.atmos-meas-tech.net/10/695/2017/amt-10-695-2017.pdf
_version_ 1818301129618882560
author S. Ruske
D. O. Topping
V. E. Foot
P. H. Kaye
W. R. Stanley
I. Crawford
A. P. Morse
M. W. Gallagher
author_facet S. Ruske
D. O. Topping
V. E. Foot
P. H. Kaye
W. R. Stanley
I. Crawford
A. P. Morse
M. W. Gallagher
author_sort S. Ruske
collection DOAJ
description Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.<br><br>This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.<br><br>For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the <i>k</i>-nearest neighbours algorithm and artificial neural networks).<br><br>The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.<br><br>Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.<br><br>A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.<br><br>We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.
first_indexed 2024-12-13T05:18:06Z
format Article
id doaj.art-50a21f8938ad4f7ca877d0bb5a5a6572
institution Directory Open Access Journal
issn 1867-1381
1867-8548
language English
last_indexed 2024-12-13T05:18:06Z
publishDate 2017-03-01
publisher Copernicus Publications
record_format Article
series Atmospheric Measurement Techniques
spelling doaj.art-50a21f8938ad4f7ca877d0bb5a5a65722022-12-21T23:58:23ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482017-03-0110269570810.5194/amt-10-695-2017Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometerS. Ruske0D. O. Topping1V. E. Foot2P. H. Kaye3W. R. Stanley4I. Crawford5A. P. Morse6M. W. Gallagher7Centre for Atmospheric Science, SEAES, University of Manchester, Manchester, UKCentre for Atmospheric Science, SEAES, University of Manchester, Manchester, UKDefence, Science and Technology Lab., Porton Down, Salisbury, Wiltshire, SP4 0JQ, UKParticle Instruments Research Group, University of Hertfordshire, Hatfield, AL 10 9AB, UK Particle Instruments Research Group, University of Hertfordshire, Hatfield, AL 10 9AB, UK Centre for Atmospheric Science, SEAES, University of Manchester, Manchester, UKDepartment of Geography and Planning, University of Liverpool, Liverpool, UKCentre for Atmospheric Science, SEAES, University of Manchester, Manchester, UKCharacterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.<br><br>This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.<br><br>For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the <i>k</i>-nearest neighbours algorithm and artificial neural networks).<br><br>The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.<br><br>Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.<br><br>A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.<br><br>We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.http://www.atmos-meas-tech.net/10/695/2017/amt-10-695-2017.pdf
spellingShingle S. Ruske
D. O. Topping
V. E. Foot
P. H. Kaye
W. R. Stanley
I. Crawford
A. P. Morse
M. W. Gallagher
Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
Atmospheric Measurement Techniques
title Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
title_full Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
title_fullStr Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
title_full_unstemmed Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
title_short Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer
title_sort evaluation of machine learning algorithms for classification of primary biological aerosol using a new uv lif spectrometer
url http://www.atmos-meas-tech.net/10/695/2017/amt-10-695-2017.pdf
work_keys_str_mv AT sruske evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT dotopping evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT vefoot evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT phkaye evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT wrstanley evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT icrawford evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT apmorse evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer
AT mwgallagher evaluationofmachinelearningalgorithmsforclassificationofprimarybiologicalaerosolusinganewuvlifspectrometer