Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†
Abstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and simil...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-04-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-021-00505-3 |
_version_ | 1818961186568273920 |
---|---|
author | Ramón Alain Miranda-Quintana Dávid Bajusz Anita Rácz Károly Héberger |
author_facet | Ramón Alain Miranda-Quintana Dávid Bajusz Anita Rácz Károly Héberger |
author_sort | Ramón Alain Miranda-Quintana |
collection | DOAJ |
description | Abstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons . |
first_indexed | 2024-12-20T12:09:26Z |
format | Article |
id | doaj.art-578817a198ac4555993eaf33f99497e9 |
institution | Directory Open Access Journal |
issn | 1758-2946 |
language | English |
last_indexed | 2024-12-20T12:09:26Z |
publishDate | 2021-04-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj.art-578817a198ac4555993eaf33f99497e92022-12-21T19:41:17ZengBMCJournal of Cheminformatics1758-29462021-04-0113111810.1186/s13321-021-00505-3Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics†Ramón Alain Miranda-Quintana0Dávid Bajusz1Anita Rácz2Károly Héberger3Department of Chemistry, University of FloridaMedicinal Chemistry Research Group, Research Centre for Natural SciencesPlasma Chemistry Research Group, ELKH Research Centre for Natural SciencesPlasma Chemistry Research Group, ELKH Research Centre for Natural SciencesAbstract Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .https://doi.org/10.1186/s13321-021-00505-3ComparisonsRankingsExtended similarity indicesConsistencyMolecular fingerprintsANOVA |
spellingShingle | Ramón Alain Miranda-Quintana Dávid Bajusz Anita Rácz Károly Héberger Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† Journal of Cheminformatics Comparisons Rankings Extended similarity indices Consistency Molecular fingerprints ANOVA |
title | Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† |
title_full | Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† |
title_fullStr | Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† |
title_full_unstemmed | Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† |
title_short | Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics† |
title_sort | extended similarity indices the benefits of comparing more than two objects simultaneously part 1 theory and characteristics† |
topic | Comparisons Rankings Extended similarity indices Consistency Molecular fingerprints ANOVA |
url | https://doi.org/10.1186/s13321-021-00505-3 |
work_keys_str_mv | AT ramonalainmirandaquintana extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics AT davidbajusz extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics AT anitaracz extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics AT karolyheberger extendedsimilarityindicesthebenefitsofcomparingmorethantwoobjectssimultaneouslypart1theoryandcharacteristics |