Weighted voting-based consensus clustering for chemical structure databases

The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been...

Full description

Bibliographic Details
Main Authors: Saeed, Faisal, Ahmed, Ali Husain, Omar, Mohd. Shahir Shamsir, Salim, Naomie
Format: Article
Published: Springer 2014
Subjects:
_version_ 1796861425504747520
author Saeed, Faisal
Ahmed, Ali Husain
Omar, Mohd. Shahir Shamsir
Salim, Naomie
author_facet Saeed, Faisal
Ahmed, Ali Husain
Omar, Mohd. Shahir Shamsir
Salim, Naomie
author_sort Saeed, Faisal
collection ePrints
description The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF-4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward's method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures.
first_indexed 2024-03-05T19:56:12Z
format Article
id utm.eprints-63240
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T19:56:12Z
publishDate 2014
publisher Springer
record_format dspace
spelling utm.eprints-632402017-06-19T03:06:39Z http://eprints.utm.my/63240/ Weighted voting-based consensus clustering for chemical structure databases Saeed, Faisal Ahmed, Ali Husain Omar, Mohd. Shahir Shamsir Salim, Naomie QH Natural history The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF-4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward's method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures. Springer 2014 Article PeerReviewed Saeed, Faisal and Ahmed, Ali Husain and Omar, Mohd. Shahir Shamsir and Salim, Naomie (2014) Weighted voting-based consensus clustering for chemical structure databases. Journal of Computer-Aided Molecular Design, 28 (6). pp. 675-684. ISSN 0920-654X http://dx.doi.org/10.1007/s10822-014-9750-2 DOI :10.1007/s10822-014-9750-2
spellingShingle QH Natural history
Saeed, Faisal
Ahmed, Ali Husain
Omar, Mohd. Shahir Shamsir
Salim, Naomie
Weighted voting-based consensus clustering for chemical structure databases
title Weighted voting-based consensus clustering for chemical structure databases
title_full Weighted voting-based consensus clustering for chemical structure databases
title_fullStr Weighted voting-based consensus clustering for chemical structure databases
title_full_unstemmed Weighted voting-based consensus clustering for chemical structure databases
title_short Weighted voting-based consensus clustering for chemical structure databases
title_sort weighted voting based consensus clustering for chemical structure databases
topic QH Natural history
work_keys_str_mv AT saeedfaisal weightedvotingbasedconsensusclusteringforchemicalstructuredatabases
AT ahmedalihusain weightedvotingbasedconsensusclusteringforchemicalstructuredatabases
AT omarmohdshahirshamsir weightedvotingbasedconsensusclusteringforchemicalstructuredatabases
AT salimnaomie weightedvotingbasedconsensusclusteringforchemicalstructuredatabases