Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†

Abstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and simil...

Full description

Bibliographic Details
Main Authors: Ramón Alain Miranda-Quintana, Dávid Bajusz, Anita Rácz, Károly Héberger
Format: Article
Language:English
Published: BMC 2021-04-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-021-00505-3
_version_ 1818961186568273920
author Ramón Alain Miranda-Quintana
Dávid Bajusz
Anita Rácz
Károly Héberger
author_facet Ramón Alain Miranda-Quintana
Dávid Bajusz
Anita Rácz
Károly Héberger
author_sort Ramón Alain Miranda-Quintana
collection DOAJ
description Abstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .
first_indexed 2024-12-20T12:09:26Z
format Article
id doaj.art-578817a198ac4555993eaf33f99497e9
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-20T12:09:26Z
publishDate 2021-04-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-578817a198ac4555993eaf33f99497e92022-12-21T19:41:17ZengBMCJournal of Cheminformatics1758-29462021-04-0113111810.1186/s13321-021-00505-3Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†Ramón Alain Miranda-Quintana0Dávid Bajusz1Anita Rácz2Károly Héberger3Department of Chemistry, University of FloridaMedicinal Chemistry Research Group, Research Centre for Natural SciencesPlasma Chemistry Research Group, ELKH Research Centre for Natural SciencesPlasma Chemistry Research Group, ELKH Research Centre for Natural SciencesAbstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .https://doi.org/10.1186/s13321-021-00505-3ComparisonsRankingsExtended similarity indicesConsistencyMolecular fingerprintsANOVA
spellingShingle Ramón Alain Miranda-Quintana
Dávid Bajusz
Anita Rácz
Károly Héberger
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
Journal of Cheminformatics
Comparisons
Rankings
Extended similarity indices
Consistency
Molecular fingerprints
ANOVA
title Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
title_full Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
title_fullStr Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
title_full_unstemmed Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
title_short Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
title_sort extended similarity indices the benefits of comparing more than two objects simultaneously part 1 theory and characteristics†
topic Comparisons
Rankings
Extended similarity indices
Consistency
Molecular fingerprints
ANOVA
url https://doi.org/10.1186/s13321-021-00505-3
work_keys_str_mv AT ramonalainmirandaquintana extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics
AT davidbajusz extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics
AT anitaracz extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics
AT karolyheberger extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics