Automated recognition of malignancy mentions in biomedical literature

<p>Abstract</p> <p>Background</p> <p>The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist i...

Full description

Bibliographic Details
Main Authors: Liberman Mark Y, Carroll Steven, Mandel Mark A, Lerman Kevin, McDonald Ryan T, Jin Yang, Pereira Fernando C, Winters Raymond S, White Peter S
Format: Article
Language:English
Published: BMC 2006-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/492
_version_ 1811283694318518272
author Liberman Mark Y
Carroll Steven
Mandel Mark A
Lerman Kevin
McDonald Ryan T
Jin Yang
Pereira Fernando C
Winters Raymond S
White Peter S
author_facet Liberman Mark Y
Carroll Steven
Mandel Mark A
Lerman Kevin
McDonald Ryan T
Jin Yang
Pereira Fernando C
Winters Raymond S
White Peter S
author_sort Liberman Mark Y
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining.</p> <p>Results</p> <p>We developed a named entity recognizer (MTag), an entity tagger for recognizing clinical descriptions of malignancy presented in text. The application uses the machine-learning technique Conditional Random Fields with additional domain-specific features. MTag was tested with 1,010 training and 432 evaluation documents pertaining to cancer genomics. Overall, our experiments resulted in 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set. Compared with a baseline system using string matching of text with a neoplasm term list, MTag performed with a much higher recall rate (92.1% vs. 42.1% recall) and demonstrated the ability to learn new patterns. Application of MTag to all MEDLINE abstracts yielded the identification of 580,002 unique and 9,153,340 overall mentions of malignancy. Significantly, addition of an extensive lexicon of malignancy mentions as a feature set for extraction had minimal impact in performance.</p> <p>Conclusion</p> <p>Together, these results suggest that the identification of disparate biomedical entity classes in free text may be achievable with high accuracy and only moderate additional effort for each new application domain.</p>
first_indexed 2024-04-13T02:16:17Z
format Article
id doaj.art-b667c666b6af4130b41dd72868b370c1
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T02:16:17Z
publishDate 2006-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-b667c666b6af4130b41dd72868b370c12022-12-22T03:07:08ZengBMCBMC Bioinformatics1471-21052006-11-017149210.1186/1471-2105-7-492Automated recognition of malignancy mentions in biomedical literatureLiberman Mark YCarroll StevenMandel Mark ALerman KevinMcDonald Ryan TJin YangPereira Fernando CWinters Raymond SWhite Peter S<p>Abstract</p> <p>Background</p> <p>The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining.</p> <p>Results</p> <p>We developed a named entity recognizer (MTag), an entity tagger for recognizing clinical descriptions of malignancy presented in text. The application uses the machine-learning technique Conditional Random Fields with additional domain-specific features. MTag was tested with 1,010 training and 432 evaluation documents pertaining to cancer genomics. Overall, our experiments resulted in 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set. Compared with a baseline system using string matching of text with a neoplasm term list, MTag performed with a much higher recall rate (92.1% vs. 42.1% recall) and demonstrated the ability to learn new patterns. Application of MTag to all MEDLINE abstracts yielded the identification of 580,002 unique and 9,153,340 overall mentions of malignancy. Significantly, addition of an extensive lexicon of malignancy mentions as a feature set for extraction had minimal impact in performance.</p> <p>Conclusion</p> <p>Together, these results suggest that the identification of disparate biomedical entity classes in free text may be achievable with high accuracy and only moderate additional effort for each new application domain.</p>http://www.biomedcentral.com/1471-2105/7/492
spellingShingle Liberman Mark Y
Carroll Steven
Mandel Mark A
Lerman Kevin
McDonald Ryan T
Jin Yang
Pereira Fernando C
Winters Raymond S
White Peter S
Automated recognition of malignancy mentions in biomedical literature
BMC Bioinformatics
title Automated recognition of malignancy mentions in biomedical literature
title_full Automated recognition of malignancy mentions in biomedical literature
title_fullStr Automated recognition of malignancy mentions in biomedical literature
title_full_unstemmed Automated recognition of malignancy mentions in biomedical literature
title_short Automated recognition of malignancy mentions in biomedical literature
title_sort automated recognition of malignancy mentions in biomedical literature
url http://www.biomedcentral.com/1471-2105/7/492
work_keys_str_mv AT libermanmarky automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT carrollsteven automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT mandelmarka automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT lermankevin automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT mcdonaldryant automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT jinyang automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT pereirafernandoc automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT wintersraymonds automatedrecognitionofmalignancymentionsinbiomedicalliterature
AT whitepeters automatedrecognitionofmalignancymentionsinbiomedicalliterature