Supporting the curation of biological databases with reusable text mining.

Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of...

Full description

Bibliographic Details
Main Authors:	Miotto, O, Tan, T, Brusic, V
Format:	Journal article
Language:	English
Published:	2005

_version_	1797084324837720064
author	Miotto, O Tan, T Brusic, V
author_facet	Miotto, O Tan, T Brusic, V
author_sort	Miotto, O
collection	OXFORD
description	Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.
first_indexed	2024-03-07T01:53:54Z
format	Journal article
id	oxford-uuid:9b11857b-41a3-4ded-9e7c-74dab1302ba6
institution	University of Oxford
language	English
last_indexed	2024-03-07T01:53:54Z
publishDate	2005
record_format	dspace
spelling	oxford-uuid:9b11857b-41a3-4ded-9e7c-74dab1302ba62022-03-27T00:26:04ZSupporting the curation of biological databases with reusable text mining.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:9b11857b-41a3-4ded-9e7c-74dab1302ba6EnglishSymplectic Elements at Oxford2005Miotto, OTan, TBrusic, VCurators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.
spellingShingle	Miotto, O Tan, T Brusic, V Supporting the curation of biological databases with reusable text mining.
title	Supporting the curation of biological databases with reusable text mining.
title_full	Supporting the curation of biological databases with reusable text mining.
title_fullStr	Supporting the curation of biological databases with reusable text mining.
title_full_unstemmed	Supporting the curation of biological databases with reusable text mining.
title_short	Supporting the curation of biological databases with reusable text mining.
title_sort	supporting the curation of biological databases with reusable text mining
work_keys_str_mv	AT miottoo supportingthecurationofbiologicaldatabaseswithreusabletextmining AT tant supportingthecurationofbiologicaldatabaseswithreusabletextmining AT brusicv supportingthecurationofbiologicaldatabaseswithreusabletextmining

Supporting the curation of biological databases with reusable text mining.

Similar Items