Web-based retrieval system for chemical structural formulas

The drug discovery process relies heavily on chemical substructure and similarity search results for lead identification. Researchers often pool substructure and similarity search results to obtain a larger set of lead molecules for drug suitability evaluation in subsequent stages of the drug discov...

Full description

Bibliographic Details
Main Author: Neo, Lok Tuan
Other Authors: Hui Siu Cheung
Format: Final Year Project (FYP)
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/55016
Description
Summary:The drug discovery process relies heavily on chemical substructure and similarity search results for lead identification. Researchers often pool substructure and similarity search results to obtain a larger set of lead molecules for drug suitability evaluation in subsequent stages of the drug discovery process. However, existing chemical search engines require users to issue similarity and substructure chemical search queries separately and only display search results to the users when the search is complete. In this project, an efficient web-based chemical search engine is proposed and implemented to efficiently deliver both types of search results to users once a match is found. Two approaches are proposed to support efficient chemical search: • Effective Substructure Screening - By combining substructure information with chemical functional groups and chemical bonds, the accuracy of the substructure screening process during a substructure search can be improved. Evaluation results showed that the combined chemical features improve precision, recall and F1 scores for almost all test queries. • Publisher-Subscriber Infrastructure - Using the Publisher-Subscriber pattern in conjunction with an effective molecule filtering process, various types of chemical search can be carried out simultaneously and results can be efficiently delivered to users. Evaluation results of the proposed search engine infrastructure indicate that it is linearly scalable when used on larger chemical databases with significant speed-ups in search time when cached results are used to filter molecules for substructure search. Both proposed approaches jointly work to enhance the efficiency and effectiveness of chemical structural formula search. In this report, the proposed substructure screening process and the proposed publisher-subscriber infrastructure will be discussed. The performance of the proposed approaches is also evaluated.