KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biolo...

Full description

Bibliographic Details
Main Authors:	Schomburg Dietmar, Thielen Bernhard, Heinen Stephanie
Format:	Article
Language:	English
Published:	BMC 2010-07-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/11/375

_version_	1811299350283812864
author	Schomburg Dietmar Thielen Bernhard Heinen Stephanie
author_facet	Schomburg Dietmar Thielen Bernhard Heinen Stephanie
author_sort	Schomburg Dietmar
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed.</p> <p>Description</p> <p>Here we present a text mining algorithm for the extraction of kinetic information such as K<sub>M</sub>, K<sub>i</sub>, k<sub>cat </sub>etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (K<sub>M</sub>, K<sub>i</sub>, k<sub>cat</sub>, k<sub>cat</sub>/K<sub>M</sub>, V<sub>max</sub>, IC<sub>50</sub>, S<sub>0.5</sub>, K<sub>d</sub>, K<sub>a</sub>, t<sub>1/2</sub>, pI, n<sub>H</sub>, specific activity, V<sub>max</sub>/K<sub>M</sub>) from about 17 million PubMed abstracts and combine them with other data in the abstract.</p> <p>A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.</p> <p>The results were stored in a database and are available as "KID the KInetic Database" via the internet.</p> <p>Conclusions</p> <p>The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases.</p> <p>The database is available at <url>http://kid.tu-bs.de</url>. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.</p>
first_indexed	2024-04-13T06:33:55Z
format	Article
id	doaj.art-944d9563a63d40f1b844ee10884f4c83
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-13T06:33:55Z
publishDate	2010-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-944d9563a63d40f1b844ee10884f4c832022-12-22T02:57:59ZengBMCBMC Bioinformatics1471-21052010-07-0111137510.1186/1471-2105-11-375KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymesSchomburg DietmarThielen BernhardHeinen Stephanie<p>Abstract</p> <p>Background</p> <p>The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed.</p> <p>Description</p> <p>Here we present a text mining algorithm for the extraction of kinetic information such as K<sub>M</sub>, K<sub>i</sub>, k<sub>cat </sub>etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (K<sub>M</sub>, K<sub>i</sub>, k<sub>cat</sub>, k<sub>cat</sub>/K<sub>M</sub>, V<sub>max</sub>, IC<sub>50</sub>, S<sub>0.5</sub>, K<sub>d</sub>, K<sub>a</sub>, t<sub>1/2</sub>, pI, n<sub>H</sub>, specific activity, V<sub>max</sub>/K<sub>M</sub>) from about 17 million PubMed abstracts and combine them with other data in the abstract.</p> <p>A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.</p> <p>The results were stored in a database and are available as "KID the KInetic Database" via the internet.</p> <p>Conclusions</p> <p>The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases.</p> <p>The database is available at <url>http://kid.tu-bs.de</url>. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.</p>http://www.biomedcentral.com/1471-2105/11/375
spellingShingle	Schomburg Dietmar Thielen Bernhard Heinen Stephanie KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes BMC Bioinformatics
title	KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
title_full	KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
title_fullStr	KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
title_full_unstemmed	KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
title_short	KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
title_sort	kid an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes
url	http://www.biomedcentral.com/1471-2105/11/375
work_keys_str_mv	AT schomburgdietmar kidanalgorithmforfastandefficienttextminingusedtoautomaticallygenerateadatabasecontainingkineticinformationofenzymes AT thielenbernhard kidanalgorithmforfastandefficienttextminingusedtoautomaticallygenerateadatabasecontainingkineticinformationofenzymes AT heinenstephanie kidanalgorithmforfastandefficienttextminingusedtoautomaticallygenerateadatabasecontainingkineticinformationofenzymes

KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

Similar Items