Automated extraction of chemical structure information from digital raster images

<p>Abstract</p> <p>Background</p> <p>To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical informatio...

Full description

Bibliographic Details
Main Authors: Shedden Kerby A, Rosania Gus R, Park Jungkap, Nguyen Mandee, Lyu Naesung, Saitou Kazuhiro
Format: Article
Language:English
Published: BMC 2009-02-01
Series:Chemistry Central Journal
Online Access:http://journal.chemistrycentral.com/content/3/1/4
_version_ 1830220831608799232
author Shedden Kerby A
Rosania Gus R
Park Jungkap
Nguyen Mandee
Lyu Naesung
Saitou Kazuhiro
author_facet Shedden Kerby A
Rosania Gus R
Park Jungkap
Nguyen Mandee
Lyu Naesung
Saitou Kazuhiro
author_sort Shedden Kerby A
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.</p> <p>Results</p> <p>This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.</p> <p>Conclusion</p> <p>The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.</p>
first_indexed 2024-12-18T08:12:38Z
format Article
id doaj.art-29d0eb458d07437e89a886f9e17b1818
institution Directory Open Access Journal
issn 1752-153X
language English
last_indexed 2024-12-18T08:12:38Z
publishDate 2009-02-01
publisher BMC
record_format Article
series Chemistry Central Journal
spelling doaj.art-29d0eb458d07437e89a886f9e17b18182022-12-21T21:14:51ZengBMCChemistry Central Journal1752-153X2009-02-0131410.1186/1752-153X-3-4Automated extraction of chemical structure information from digital raster imagesShedden Kerby ARosania Gus RPark JungkapNguyen MandeeLyu NaesungSaitou Kazuhiro<p>Abstract</p> <p>Background</p> <p>To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.</p> <p>Results</p> <p>This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.</p> <p>Conclusion</p> <p>The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.</p>http://journal.chemistrycentral.com/content/3/1/4
spellingShingle Shedden Kerby A
Rosania Gus R
Park Jungkap
Nguyen Mandee
Lyu Naesung
Saitou Kazuhiro
Automated extraction of chemical structure information from digital raster images
Chemistry Central Journal
title Automated extraction of chemical structure information from digital raster images
title_full Automated extraction of chemical structure information from digital raster images
title_fullStr Automated extraction of chemical structure information from digital raster images
title_full_unstemmed Automated extraction of chemical structure information from digital raster images
title_short Automated extraction of chemical structure information from digital raster images
title_sort automated extraction of chemical structure information from digital raster images
url http://journal.chemistrycentral.com/content/3/1/4
work_keys_str_mv AT sheddenkerbya automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT rosaniagusr automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT parkjungkap automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT nguyenmandee automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT lyunaesung automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT saitoukazuhiro automatedextractionofchemicalstructureinformationfromdigitalrasterimages