GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs

Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are...

Full description

Bibliographic Details
Main Authors: Jinsong Zhang, Xiaomei Yu, Zhichao Wang, Xiangwei Zheng
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157823002082
Description
Summary:Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community.
ISSN:1319-1578