Ligand-Based Virtual Screening Based on the Graph Edit Distance

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node des...

Full description

Bibliographic Details
Main Authors: Elena Rica, Susana Álvarez, Francesc Serratosa
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/22/23/12751
_version_ 1797507757290553344
author Elena Rica
Susana Álvarez
Francesc Serratosa
author_facet Elena Rica
Susana Álvarez
Francesc Serratosa
author_sort Elena Rica
collection DOAJ
description Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.
first_indexed 2024-03-10T04:52:59Z
format Article
id doaj.art-1ae6e4c9dc784ab881fd67fd0a4a6acc
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-10T04:52:59Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-1ae6e4c9dc784ab881fd67fd0a4a6acc2023-11-23T02:27:29ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-11-0122231275110.3390/ijms222312751Ligand-Based Virtual Screening Based on the Graph Edit DistanceElena Rica0Susana Álvarez1Francesc Serratosa2Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, SpainDepartament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, SpainDepartament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, SpainChemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.https://www.mdpi.com/1422-0067/22/23/12751virtual screeningmolecular similarityextended reduced graphstructure activity relationshipsmachine learninggraph edit distance
spellingShingle Elena Rica
Susana Álvarez
Francesc Serratosa
Ligand-Based Virtual Screening Based on the Graph Edit Distance
International Journal of Molecular Sciences
virtual screening
molecular similarity
extended reduced graph
structure activity relationships
machine learning
graph edit distance
title Ligand-Based Virtual Screening Based on the Graph Edit Distance
title_full Ligand-Based Virtual Screening Based on the Graph Edit Distance
title_fullStr Ligand-Based Virtual Screening Based on the Graph Edit Distance
title_full_unstemmed Ligand-Based Virtual Screening Based on the Graph Edit Distance
title_short Ligand-Based Virtual Screening Based on the Graph Edit Distance
title_sort ligand based virtual screening based on the graph edit distance
topic virtual screening
molecular similarity
extended reduced graph
structure activity relationships
machine learning
graph edit distance
url https://www.mdpi.com/1422-0067/22/23/12751
work_keys_str_mv AT elenarica ligandbasedvirtualscreeningbasedonthegrapheditdistance
AT susanaalvarez ligandbasedvirtualscreeningbasedonthegrapheditdistance
AT francescserratosa ligandbasedvirtualscreeningbasedonthegrapheditdistance