A hybrid approach for named entity recognition in Chinese electronic medical record

Abstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named enti...

Full description

Bibliographic Details
Main Authors: Bin Ji, Rui Liu, Shasha Li, Jie Yu, Qingbo Wu, Yusong Tan, Jiaju Wu
Format: Article
Language:English
Published: BMC 2019-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-019-0767-2
_version_ 1819107912551759872
author Bin Ji
Rui Liu
Shasha Li
Jie Yu
Qingbo Wu
Yusong Tan
Jiaju Wu
author_facet Bin Ji
Rui Liu
Shasha Li
Jie Yu
Qingbo Wu
Yusong Tan
Jiaju Wu
author_sort Bin Ji
collection DOAJ
description Abstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. Results The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. Conclusion Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.
first_indexed 2024-12-22T03:01:34Z
format Article
id doaj.art-f4aa03e3907c4aafa06886c5fb37179d
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-22T03:01:34Z
publishDate 2019-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-f4aa03e3907c4aafa06886c5fb37179d2022-12-21T18:41:10ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119S214915810.1186/s12911-019-0767-2A hybrid approach for named entity recognition in Chinese electronic medical recordBin Ji0Rui Liu1Shasha Li2Jie Yu3Qingbo Wu4Yusong Tan5Jiaju Wu6College of Computer, National University of Defense TechnologyDepartment of Oncology, the Second Xiangya Hospital of Central South UniversityCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyInstitute of Computer Application, China Academic of Engineering PhysicsAbstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. Results The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. Conclusion Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.http://link.springer.com/article/10.1186/s12911-019-0767-2BiLSTM-CRFAttentionChinese electronic medical recordNamed entity recognitionDrug dictionary
spellingShingle Bin Ji
Rui Liu
Shasha Li
Jie Yu
Qingbo Wu
Yusong Tan
Jiaju Wu
A hybrid approach for named entity recognition in Chinese electronic medical record
BMC Medical Informatics and Decision Making
BiLSTM-CRF
Attention
Chinese electronic medical record
Named entity recognition
Drug dictionary
title A hybrid approach for named entity recognition in Chinese electronic medical record
title_full A hybrid approach for named entity recognition in Chinese electronic medical record
title_fullStr A hybrid approach for named entity recognition in Chinese electronic medical record
title_full_unstemmed A hybrid approach for named entity recognition in Chinese electronic medical record
title_short A hybrid approach for named entity recognition in Chinese electronic medical record
title_sort hybrid approach for named entity recognition in chinese electronic medical record
topic BiLSTM-CRF
Attention
Chinese electronic medical record
Named entity recognition
Drug dictionary
url http://link.springer.com/article/10.1186/s12911-019-0767-2
work_keys_str_mv AT binji ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT ruiliu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT shashali ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT jieyu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT qingbowu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT yusongtan ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT jiajuwu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT binji hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT ruiliu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT shashali hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT jieyu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT qingbowu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT yusongtan hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT jiajuwu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord