Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics

Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no st...

Full description

Bibliographic Details
Main Authors: Seongho Kim, Ikuko Kato, Xiang Zhang
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/12/8/694
_version_ 1827626562057404416
author Seongho Kim
Ikuko Kato
Xiang Zhang
author_facet Seongho Kim
Ikuko Kato
Xiang Zhang
author_sort Seongho Kim
collection DOAJ
description Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no study to assess the performance of binary similarity measures in compound identification, even though the well-known Jaccard similarity measure has been widely used without proper evaluation. The objective of this study is thus to evaluate the performance of binary similarity measures for compound identification in untargeted metabolomics. Fifteen binary similarity measures, including the well-known Jaccard, Dice, Sokal–Sneath, Cosine, and Simpson measures, were selected to assess their performance in compound identification. using both electron ionization (EI) and electrospray ionization (ESI) mass spectra. Our theoretical evaluations show that the accuracy of the compound identification was exactly the same between the Jaccard, Dice, 3W-Jaccard, Sokal–Sneath, and Kulczynski measures, between the Cosine and Hellinger measures, and between the McConnaughey and Driver–Kroeber measures, which were practically confirmed using mass spectra libraries. From the mass spectrum-based evaluation, we observed that the best performing similarity measures were the McConnaughey and Driver–Kroeber measures for EI mass spectra and the Cosine and Hellinger measures for ESI mass spectra. The most robust similarity measure was the Fager–McGowan measure, the second-best performing similarity measure in both EI and ESI mass spectra.
first_indexed 2024-03-09T12:58:22Z
format Article
id doaj.art-ee80612bfeb14494b61d0404b203e9b7
institution Directory Open Access Journal
issn 2218-1989
language English
last_indexed 2024-03-09T12:58:22Z
publishDate 2022-07-01
publisher MDPI AG
record_format Article
series Metabolites
spelling doaj.art-ee80612bfeb14494b61d0404b203e9b72023-11-30T21:57:58ZengMDPI AGMetabolites2218-19892022-07-0112869410.3390/metabo12080694Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based MetabolomicsSeongho Kim0Ikuko Kato1Xiang Zhang2Biostatistics and Bioinformatics Core, Karmanos Cancer Institute, Department of Oncology, School of Medicine, Wayne State University, Detroit, MI 48201, USADepartment of Oncology and Pathology, School of Medicine, Wayne State University, Detroit, MI 48201, USADepartment of Chemistry, University of Louisville, Louisville, KY 40292, USACompound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no study to assess the performance of binary similarity measures in compound identification, even though the well-known Jaccard similarity measure has been widely used without proper evaluation. The objective of this study is thus to evaluate the performance of binary similarity measures for compound identification in untargeted metabolomics. Fifteen binary similarity measures, including the well-known Jaccard, Dice, Sokal–Sneath, Cosine, and Simpson measures, were selected to assess their performance in compound identification. using both electron ionization (EI) and electrospray ionization (ESI) mass spectra. Our theoretical evaluations show that the accuracy of the compound identification was exactly the same between the Jaccard, Dice, 3W-Jaccard, Sokal–Sneath, and Kulczynski measures, between the Cosine and Hellinger measures, and between the McConnaughey and Driver–Kroeber measures, which were practically confirmed using mass spectra libraries. From the mass spectrum-based evaluation, we observed that the best performing similarity measures were the McConnaughey and Driver–Kroeber measures for EI mass spectra and the Cosine and Hellinger measures for ESI mass spectra. The most robust similarity measure was the Fager–McGowan measure, the second-best performing similarity measure in both EI and ESI mass spectra.https://www.mdpi.com/2218-1989/12/8/694binary similarity measurecompound identificationEIESImass spectrometryuntargeted metabolomics
spellingShingle Seongho Kim
Ikuko Kato
Xiang Zhang
Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
Metabolites
binary similarity measure
compound identification
EI
ESI
mass spectrometry
untargeted metabolomics
title Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
title_full Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
title_fullStr Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
title_full_unstemmed Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
title_short Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
title_sort comparative analysis of binary similarity measures for compound identification in mass spectrometry based metabolomics
topic binary similarity measure
compound identification
EI
ESI
mass spectrometry
untargeted metabolomics
url https://www.mdpi.com/2218-1989/12/8/694
work_keys_str_mv AT seonghokim comparativeanalysisofbinarysimilaritymeasuresforcompoundidentificationinmassspectrometrybasedmetabolomics
AT ikukokato comparativeanalysisofbinarysimilaritymeasuresforcompoundidentificationinmassspectrometrybasedmetabolomics
AT xiangzhang comparativeanalysisofbinarysimilaritymeasuresforcompoundidentificationinmassspectrometrybasedmetabolomics