Chinese medical entity recognition based on the dual-branch TENER model

Abstract Background Named Entity Recognition (NER) is a long-standing fundamental problem in various research fields of Natural Language Processing (NLP) and has been practiced in many application scenarios. However, the application results of NER methods in Chinese electronic medical records (EMRs)...

Full description

Bibliographic Details
Main Authors: Hui Peng, Zhichang Zhang, Dan Liu, Xiaohui Qin
Format: Article
Language:English
Published: BMC 2023-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-023-02243-y
_version_ 1797769416819081216
author Hui Peng
Zhichang Zhang
Dan Liu
Xiaohui Qin
author_facet Hui Peng
Zhichang Zhang
Dan Liu
Xiaohui Qin
author_sort Hui Peng
collection DOAJ
description Abstract Background Named Entity Recognition (NER) is a long-standing fundamental problem in various research fields of Natural Language Processing (NLP) and has been practiced in many application scenarios. However, the application results of NER methods in Chinese electronic medical records (EMRs) are not satisfactory, mainly due to the following two problems: (1) Existing methods do not take into account the impact of medical terminology on model recognition performance, resulting in poor model performance. (2) Existing methods do not fully utilize the Chinese language features contained in EMR, resulting in poor model robustness. Therefore, it is imminent to solve these two problems regarding the performance of the NER model for EMRs. Methods In this paper, a TENER-based radical feature and entity augmentation model for NER in Chinese EMRs is proposed. The TENER model is first used in the pre-training stage to extract deep semantic information from each layer of the feature extractor. In the decoder part, the recognition of medical entity boundary and entity category are divided into two branch tasks. Results We compare the overall performance of the proposed model with existing models on different datasets using the computed F1 score evaluation metric. The experimental results show that our model achieves the best F1 score of 82.67%, 74.37%, 70.16% on the CCKS2019, ERTCMM, and CEMR data sets. Meanwhile, in the CMeEE challenge, our model surpassed the top-3 with the F1 score of 68.39%. Conclusions Our proposed model is the first to divide the NER task into a two-branch tasks, entity boundary and types recognition. Firstly, the medical entity dictionary information is integrated into TENER to obtain the feature information of professional terms in Chinese EMRs. Secondly, the features of Chinese radicals in Chinese EMRs extracted by CNN are added to the entity category recognition task. Finally, the effectiveness of the model is validated on four datasets and competitive results are achieved.
first_indexed 2024-03-12T21:08:43Z
format Article
id doaj.art-21ca7cafe44246b5b58546672aeebfa6
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-03-12T21:08:43Z
publishDate 2023-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-21ca7cafe44246b5b58546672aeebfa62023-07-30T11:17:17ZengBMCBMC Medical Informatics and Decision Making1472-69472023-07-0123111410.1186/s12911-023-02243-yChinese medical entity recognition based on the dual-branch TENER modelHui Peng0Zhichang Zhang1Dan Liu2Xiaohui Qin3College of Computer Science and Engineering, Northwest Normal UniversityCollege of Computer Science and Engineering, Northwest Normal UniversityCollege of Computer Science and Engineering, Northwest Normal UniversityCollege of Computer Science and Engineering, Northwest Normal UniversityAbstract Background Named Entity Recognition (NER) is a long-standing fundamental problem in various research fields of Natural Language Processing (NLP) and has been practiced in many application scenarios. However, the application results of NER methods in Chinese electronic medical records (EMRs) are not satisfactory, mainly due to the following two problems: (1) Existing methods do not take into account the impact of medical terminology on model recognition performance, resulting in poor model performance. (2) Existing methods do not fully utilize the Chinese language features contained in EMR, resulting in poor model robustness. Therefore, it is imminent to solve these two problems regarding the performance of the NER model for EMRs. Methods In this paper, a TENER-based radical feature and entity augmentation model for NER in Chinese EMRs is proposed. The TENER model is first used in the pre-training stage to extract deep semantic information from each layer of the feature extractor. In the decoder part, the recognition of medical entity boundary and entity category are divided into two branch tasks. Results We compare the overall performance of the proposed model with existing models on different datasets using the computed F1 score evaluation metric. The experimental results show that our model achieves the best F1 score of 82.67%, 74.37%, 70.16% on the CCKS2019, ERTCMM, and CEMR data sets. Meanwhile, in the CMeEE challenge, our model surpassed the top-3 with the F1 score of 68.39%. Conclusions Our proposed model is the first to divide the NER task into a two-branch tasks, entity boundary and types recognition. Firstly, the medical entity dictionary information is integrated into TENER to obtain the feature information of professional terms in Chinese EMRs. Secondly, the features of Chinese radicals in Chinese EMRs extracted by CNN are added to the entity category recognition task. Finally, the effectiveness of the model is validated on four datasets and competitive results are achieved.https://doi.org/10.1186/s12911-023-02243-yElectronic medical recordsNamed entity recognitionTENERChar-Entity-TransformerDual-branch
spellingShingle Hui Peng
Zhichang Zhang
Dan Liu
Xiaohui Qin
Chinese medical entity recognition based on the dual-branch TENER model
BMC Medical Informatics and Decision Making
Electronic medical records
Named entity recognition
TENER
Char-Entity-Transformer
Dual-branch
title Chinese medical entity recognition based on the dual-branch TENER model
title_full Chinese medical entity recognition based on the dual-branch TENER model
title_fullStr Chinese medical entity recognition based on the dual-branch TENER model
title_full_unstemmed Chinese medical entity recognition based on the dual-branch TENER model
title_short Chinese medical entity recognition based on the dual-branch TENER model
title_sort chinese medical entity recognition based on the dual branch tener model
topic Electronic medical records
Named entity recognition
TENER
Char-Entity-Transformer
Dual-branch
url https://doi.org/10.1186/s12911-023-02243-y
work_keys_str_mv AT huipeng chinesemedicalentityrecognitionbasedonthedualbranchtenermodel
AT zhichangzhang chinesemedicalentityrecognitionbasedonthedualbranchtenermodel
AT danliu chinesemedicalentityrecognitionbasedonthedualbranchtenermodel
AT xiaohuiqin chinesemedicalentityrecognitionbasedonthedualbranchtenermodel