Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints

Abstract Background Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity o...

Full description

Bibliographic Details
Main Authors: Anita Rácz, Dávid Bajusz, Károly Héberger
Format: Article
Language:English
Published: BMC 2018-10-01
Series:Journal of Cheminformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13321-018-0302-y
_version_ 1819042953374466048
author Anita Rácz
Dávid Bajusz
Károly Héberger
author_facet Anita Rácz
Dávid Bajusz
Károly Héberger
author_sort Anita Rácz
collection DOAJ
description Abstract Background Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.
first_indexed 2024-12-21T09:49:05Z
format Article
id doaj.art-61304046ad544191a4765316aaf90949
institution Directory Open Access Journal
issn 1758-2946
language English
last_indexed 2024-12-21T09:49:05Z
publishDate 2018-10-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj.art-61304046ad544191a4765316aaf909492022-12-21T19:08:14ZengBMCJournal of Cheminformatics1758-29462018-10-0110111210.1186/s13321-018-0302-yLife beyond the Tanimoto coefficient: similarity measures for interaction fingerprintsAnita Rácz0Dávid Bajusz1Károly Héberger2Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of SciencesMedicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of SciencesPlasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of SciencesAbstract Background Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.http://link.springer.com/article/10.1186/s13321-018-0302-yVirtual screeningInteraction fingerprintSimilarity metricsSRDANOVAFPKit
spellingShingle Anita Rácz
Dávid Bajusz
Károly Héberger
Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
Journal of Cheminformatics
Virtual screening
Interaction fingerprint
Similarity metrics
SRD
ANOVA
FPKit
title Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
title_full Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
title_fullStr Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
title_full_unstemmed Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
title_short Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints
title_sort life beyond the tanimoto coefficient similarity measures for interaction fingerprints
topic Virtual screening
Interaction fingerprint
Similarity metrics
SRD
ANOVA
FPKit
url http://link.springer.com/article/10.1186/s13321-018-0302-y
work_keys_str_mv AT anitaracz lifebeyondthetanimotocoefficientsimilaritymeasuresforinteractionfingerprints
AT davidbajusz lifebeyondthetanimotocoefficientsimilaritymeasuresforinteractionfingerprints
AT karolyheberger lifebeyondthetanimotocoefficientsimilaritymeasuresforinteractionfingerprints