A machine learning approach to aerosol classification for single-particle mass spectrometry

Compositional analysis of atmospheric and laboratory aerosols is often conducted via single-particle mass spectrometry (SPMS), an in situ and real-time analytical technique that produces mass spectra on a single-particle basis. In this study, classifiers are created using a data set of SPMS spectra...

Full description

Bibliographic Details
Main Authors: Christopoulos, Costa (Costa D.), Garimella, Sarvesh, Zawadowicz, Maria Anna, Cziczo, Daniel James
Other Authors: Massachusetts Institute of Technology. Department of Earth, Atmospheric, and Planetary Sciences
Format: Article
Language:English
Published: Copernicus GmbH 2020
Online Access:https://hdl.handle.net/1721.1/125295
Description
Summary:Compositional analysis of atmospheric and laboratory aerosols is often conducted via single-particle mass spectrometry (SPMS), an in situ and real-time analytical technique that produces mass spectra on a single-particle basis. In this study, classifiers are created using a data set of SPMS spectra to automatically differentiate particles on the basis of chemistry and size. Machine learning algorithms build a predictive model from a training set for which the aerosol type associated with each mass spectrum is known a priori. Our primary focus surrounds the growing of random forests using feature selection to reduce dimensionality and the evaluation of trained models with confusion matrices. In addition to classifying ∼ 20 unique, but chemically similar, aerosol types, models were also created to differentiate aerosol within four broader categories: fertile soils, mineral/metallic particles, biological particles, and all other aerosols. Differentiation was accomplished using ∼ 40 positive and negative spectral features. For the broad categorization, machine learning resulted in a classification accuracy of ∼ 93%. Classification of aerosols by specific type resulted in a classification accuracy of ∼ 87%. The model was then applied to a mixture of aerosols which was known to be a subset of the training set. Model agreement was found on the presence of secondary organic aerosol, coated and uncoated mineral dust, and fertile soil.