Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.

A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the...

Full description

Bibliographic Details
Main Authors: Melis Onel, Burcu Beykal, Kyle Ferguson, Weihsueh A Chiu, Thomas J McDonald, Lan Zhou, John S House, Fred A Wright, David A Sheen, Ivan Rusyn, Efstratios N Pistikopoulos
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0223517
_version_ 1818824984163778560
author Melis Onel
Burcu Beykal
Kyle Ferguson
Weihsueh A Chiu
Thomas J McDonald
Lan Zhou
John S House
Fred A Wright
David A Sheen
Ivan Rusyn
Efstratios N Pistikopoulos
author_facet Melis Onel
Burcu Beykal
Kyle Ferguson
Weihsueh A Chiu
Thomas J McDonald
Lan Zhou
John S House
Fred A Wright
David A Sheen
Ivan Rusyn
Efstratios N Pistikopoulos
author_sort Melis Onel
collection DOAJ
description A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes-Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.
first_indexed 2024-12-19T00:04:33Z
format Article
id doaj.art-a3ca8acc301341559b33c779ebc27731
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-19T00:04:33Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-a3ca8acc301341559b33c779ebc277312022-12-21T20:46:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011410e022351710.1371/journal.pone.0223517Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.Melis OnelBurcu BeykalKyle FergusonWeihsueh A ChiuThomas J McDonaldLan ZhouJohn S HouseFred A WrightDavid A SheenIvan RusynEfstratios N PistikopoulosA detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes-Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.https://doi.org/10.1371/journal.pone.0223517
spellingShingle Melis Onel
Burcu Beykal
Kyle Ferguson
Weihsueh A Chiu
Thomas J McDonald
Lan Zhou
John S House
Fred A Wright
David A Sheen
Ivan Rusyn
Efstratios N Pistikopoulos
Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
PLoS ONE
title Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
title_full Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
title_fullStr Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
title_full_unstemmed Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
title_short Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization.
title_sort grouping of complex substances using analytical chemistry data a framework for quantitative evaluation and visualization
url https://doi.org/10.1371/journal.pone.0223517
work_keys_str_mv AT melisonel groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT burcubeykal groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT kyleferguson groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT weihsuehachiu groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT thomasjmcdonald groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT lanzhou groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT johnshouse groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT fredawright groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT davidasheen groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT ivanrusyn groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT efstratiosnpistikopoulos groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization