GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs

Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are...

Full description

Bibliographic Details
Main Authors: Jinsong Zhang, Xiaomei Yu, Zhichao Wang, Xiangwei Zheng
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157823002082
_version_ 1797663905466548224
author Jinsong Zhang
Xiaomei Yu
Zhichao Wang
Xiangwei Zheng
author_facet Jinsong Zhang
Xiaomei Yu
Zhichao Wang
Xiangwei Zheng
author_sort Jinsong Zhang
collection DOAJ
description Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community.
first_indexed 2024-03-11T19:21:34Z
format Article
id doaj.art-dc8ad102f541483fa8ffb1fbf60fee7a
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-03-11T19:21:34Z
publishDate 2023-09-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-dc8ad102f541483fa8ffb1fbf60fee7a2023-10-07T04:33:58ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-09-01358101654GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRsJinsong Zhang0Xiaomei Yu1Zhichao Wang2Xiangwei Zheng3School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaCorresponding author at: School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China.; School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaElectronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community.http://www.sciencedirect.com/science/article/pii/S1319157823002082Named entity recognitionChinese electronic health recordsMedical information processingMedical textNatural language processing
spellingShingle Jinsong Zhang
Xiaomei Yu
Zhichao Wang
Xiangwei Zheng
GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
Journal of King Saud University: Computer and Information Sciences
Named entity recognition
Chinese electronic health records
Medical information processing
Medical text
Natural language processing
title GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
title_full GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
title_fullStr GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
title_full_unstemmed GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
title_short GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
title_sort gwbner a named entity recognition method based on character glyph and word boundary features for chinese ehrs
topic Named entity recognition
Chinese electronic health records
Medical information processing
Medical text
Natural language processing
url http://www.sciencedirect.com/science/article/pii/S1319157823002082
work_keys_str_mv AT jinsongzhang gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs
AT xiaomeiyu gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs
AT zhichaowang gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs
AT xiangweizheng gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs