GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs
Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-09-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157823002082 |
_version_ | 1797663905466548224 |
---|---|
author | Jinsong Zhang Xiaomei Yu Zhichao Wang Xiangwei Zheng |
author_facet | Jinsong Zhang Xiaomei Yu Zhichao Wang Xiangwei Zheng |
author_sort | Jinsong Zhang |
collection | DOAJ |
description | Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community. |
first_indexed | 2024-03-11T19:21:34Z |
format | Article |
id | doaj.art-dc8ad102f541483fa8ffb1fbf60fee7a |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-03-11T19:21:34Z |
publishDate | 2023-09-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-dc8ad102f541483fa8ffb1fbf60fee7a2023-10-07T04:33:58ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-09-01358101654GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRsJinsong Zhang0Xiaomei Yu1Zhichao Wang2Xiangwei Zheng3School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaCorresponding author at: School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China.; School of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan 250300, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250300, ChinaElectronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community.http://www.sciencedirect.com/science/article/pii/S1319157823002082Named entity recognitionChinese electronic health recordsMedical information processingMedical textNatural language processing |
spellingShingle | Jinsong Zhang Xiaomei Yu Zhichao Wang Xiangwei Zheng GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs Journal of King Saud University: Computer and Information Sciences Named entity recognition Chinese electronic health records Medical information processing Medical text Natural language processing |
title | GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs |
title_full | GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs |
title_fullStr | GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs |
title_full_unstemmed | GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs |
title_short | GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs |
title_sort | gwbner a named entity recognition method based on character glyph and word boundary features for chinese ehrs |
topic | Named entity recognition Chinese electronic health records Medical information processing Medical text Natural language processing |
url | http://www.sciencedirect.com/science/article/pii/S1319157823002082 |
work_keys_str_mv | AT jinsongzhang gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs AT xiaomeiyu gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs AT zhichaowang gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs AT xiangweizheng gwbneranamedentityrecognitionmethodbasedoncharacterglyphandwordboundaryfeaturesforchineseehrs |