A hybrid approach for named entity recognition in Chinese electronic medical record
Abstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named enti...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-04-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12911-019-0767-2 |
_version_ | 1819107912551759872 |
---|---|
author | Bin Ji Rui Liu Shasha Li Jie Yu Qingbo Wu Yusong Tan Jiaju Wu |
author_facet | Bin Ji Rui Liu Shasha Li Jie Yu Qingbo Wu Yusong Tan Jiaju Wu |
author_sort | Bin Ji |
collection | DOAJ |
description | Abstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. Results The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. Conclusion Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset. |
first_indexed | 2024-12-22T03:01:34Z |
format | Article |
id | doaj.art-f4aa03e3907c4aafa06886c5fb37179d |
institution | Directory Open Access Journal |
issn | 1472-6947 |
language | English |
last_indexed | 2024-12-22T03:01:34Z |
publishDate | 2019-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj.art-f4aa03e3907c4aafa06886c5fb37179d2022-12-21T18:41:10ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119S214915810.1186/s12911-019-0767-2A hybrid approach for named entity recognition in Chinese electronic medical recordBin Ji0Rui Liu1Shasha Li2Jie Yu3Qingbo Wu4Yusong Tan5Jiaju Wu6College of Computer, National University of Defense TechnologyDepartment of Oncology, the Second Xiangya Hospital of Central South UniversityCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyCollege of Computer, National University of Defense TechnologyInstitute of Computer Application, China Academic of Engineering PhysicsAbstract Background With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. Methods In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. Results The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. Conclusion Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.http://link.springer.com/article/10.1186/s12911-019-0767-2BiLSTM-CRFAttentionChinese electronic medical recordNamed entity recognitionDrug dictionary |
spellingShingle | Bin Ji Rui Liu Shasha Li Jie Yu Qingbo Wu Yusong Tan Jiaju Wu A hybrid approach for named entity recognition in Chinese electronic medical record BMC Medical Informatics and Decision Making BiLSTM-CRF Attention Chinese electronic medical record Named entity recognition Drug dictionary |
title | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_full | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_fullStr | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_full_unstemmed | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_short | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_sort | hybrid approach for named entity recognition in chinese electronic medical record |
topic | BiLSTM-CRF Attention Chinese electronic medical record Named entity recognition Drug dictionary |
url | http://link.springer.com/article/10.1186/s12911-019-0767-2 |
work_keys_str_mv | AT binji ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT ruiliu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT shashali ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT jieyu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT qingbowu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT yusongtan ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT jiajuwu ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT binji hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT ruiliu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT shashali hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT jieyu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT qingbowu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT yusongtan hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT jiajuwu hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord |