SBLC: a hybrid model for disease named entity recognition based on semantic bidirectional LSTMs and conditional random fields

Abstract Background Disease named entity recognition (NER) is a fundamental step in information processing of medical texts. However, disease NER involves complex issues such as descriptive modifiers in actual practice. The accurate identification of disease NER is a still an open and essential rese...

Full description

Bibliographic Details
Main Authors: Kai Xu, Zhanfan Zhou, Tao Gong, Tianyong Hao, Wenyin Liu
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-018-0690-y
Description
Summary:Abstract Background Disease named entity recognition (NER) is a fundamental step in information processing of medical texts. However, disease NER involves complex issues such as descriptive modifiers in actual practice. The accurate identification of disease NER is a still an open and essential research problem in medical information extraction and text mining tasks. Methods A hybrid model named Semantics Bidirectional LSTM and CRF (SBLC) for disease named entity recognition task is proposed. The model leverages word embeddings, Bidirectional Long Short Term Memory networks and Conditional Random Fields. A publically available NCBI disease dataset is applied to evaluate the model through comparing with nine state-of-the-art baseline methods including cTAKES, MetaMap, DNorm, C-Bi-LSTM-CRF, TaggerOne and DNER. Results The results show that the SBLC model achieves an F1 score of 0.862 and outperforms the other methods. In addition, the model does not rely on external domain dictionaries, thus it can be more conveniently applied in many aspects of medical text processing. Conclusions According to performance comparison, the proposed SBLC model achieved the best performance, demonstrating its effectiveness in disease named entity recognition.
ISSN:1472-6947