Handwriting recognition and retrieval for chemical structural formulas

Chemicals with similar structures often have similar chemical properties, chemical re- action and even physical properties. Therefore, in many drug discovery projects, it is required to search for similar chemical structures of drug-like compounds that are worthy for further synthetic investigation....

Full description

Bibliographic Details
Main Author: Tang, Peng
Other Authors: Hui Siu Cheung
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/63297
_version_ 1811679712149241856
author Tang, Peng
author2 Hui Siu Cheung
author_facet Hui Siu Cheung
Tang, Peng
author_sort Tang, Peng
collection NTU
description Chemicals with similar structures often have similar chemical properties, chemical re- action and even physical properties. Therefore, in many drug discovery projects, it is required to search for similar chemical structures of drug-like compounds that are worthy for further synthetic investigation. However, most of the current search engines only work well for text-based information. They are unable to provide good support for chemical structural search. Moreover, to perform chemical structural search, it is necessary to input a chemical structural query. Compared to handwriting-based input, the traditional template-based input is much more complicated and non-intuitive. With the growing popularity of touch-based devices, handwriting-based input has become much more important. Due to the spatial complexity of chemical structural formulas, it is challenging to recognize handwritten chemical structural formulas with both precision and efficiency. In this research, we focus on investigating various techniques to support handwritten chemical recognition and retrieval for chemical structural formulas. In this research, we have made the following contributions: • Handwritten Chemical Symbol Recognition. We proposed a CF44 chemical feature set consisting of 44 chemical symbol features which model the writing process, visual appearance and contextual environment of handwritten chemical symbols. In addition, we also proposed a handwritten chemical symbol recognition approach which is based on Support Vector Machine and our proposed CF44 chemical symbol feature set. • Progressive Chemical Structural Analysis. We proposed a chemical structural analysis approach to support progressive recognition of handwritten chemical structural formulas. In the proposed approach, Chemical Structural Graph was proposed to model chemical structural formulas. In addition, we also proposed a novel connected bond analysis method and ring closure detection method to support the recognition of complex chemical structures such as connected bonds and cyclic ring structures. • Chemical Structural Similarity Retrieval. We proposed two approaches for chemical structural similarity retrieval which retrieve functionally similar chemical structural formulas to the query. The two proposed chemical structural retrieval approaches are based on Vector Space Model and Formal Concept Analysis respectively. In addition, we also proposed a web-based chemical retrieval system for efficient chemical structural similarity retrieval using the publish-subscribe model.
first_indexed 2024-10-01T03:13:31Z
format Thesis
id ntu-10356/63297
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:13:31Z
publishDate 2015
record_format dspace
spelling ntu-10356/632972023-03-04T00:35:30Z Handwriting recognition and retrieval for chemical structural formulas Tang, Peng Hui Siu Cheung School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Chemicals with similar structures often have similar chemical properties, chemical re- action and even physical properties. Therefore, in many drug discovery projects, it is required to search for similar chemical structures of drug-like compounds that are worthy for further synthetic investigation. However, most of the current search engines only work well for text-based information. They are unable to provide good support for chemical structural search. Moreover, to perform chemical structural search, it is necessary to input a chemical structural query. Compared to handwriting-based input, the traditional template-based input is much more complicated and non-intuitive. With the growing popularity of touch-based devices, handwriting-based input has become much more important. Due to the spatial complexity of chemical structural formulas, it is challenging to recognize handwritten chemical structural formulas with both precision and efficiency. In this research, we focus on investigating various techniques to support handwritten chemical recognition and retrieval for chemical structural formulas. In this research, we have made the following contributions: • Handwritten Chemical Symbol Recognition. We proposed a CF44 chemical feature set consisting of 44 chemical symbol features which model the writing process, visual appearance and contextual environment of handwritten chemical symbols. In addition, we also proposed a handwritten chemical symbol recognition approach which is based on Support Vector Machine and our proposed CF44 chemical symbol feature set. • Progressive Chemical Structural Analysis. We proposed a chemical structural analysis approach to support progressive recognition of handwritten chemical structural formulas. In the proposed approach, Chemical Structural Graph was proposed to model chemical structural formulas. In addition, we also proposed a novel connected bond analysis method and ring closure detection method to support the recognition of complex chemical structures such as connected bonds and cyclic ring structures. • Chemical Structural Similarity Retrieval. We proposed two approaches for chemical structural similarity retrieval which retrieve functionally similar chemical structural formulas to the query. The two proposed chemical structural retrieval approaches are based on Vector Space Model and Formal Concept Analysis respectively. In addition, we also proposed a web-based chemical retrieval system for efficient chemical structural similarity retrieval using the publish-subscribe model. Doctor of Philosophy (SCE) 2015-05-12T04:59:14Z 2015-05-12T04:59:14Z 2015 2015 Thesis Tang, P. (2015). Handwriting recognition and retrieval for chemical structural formulas. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/63297 en 178 p. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Tang, Peng
Handwriting recognition and retrieval for chemical structural formulas
title Handwriting recognition and retrieval for chemical structural formulas
title_full Handwriting recognition and retrieval for chemical structural formulas
title_fullStr Handwriting recognition and retrieval for chemical structural formulas
title_full_unstemmed Handwriting recognition and retrieval for chemical structural formulas
title_short Handwriting recognition and retrieval for chemical structural formulas
title_sort handwriting recognition and retrieval for chemical structural formulas
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
url http://hdl.handle.net/10356/63297
work_keys_str_mv AT tangpeng handwritingrecognitionandretrievalforchemicalstructuralformulas