Chinese Named Entity Recognition Method for Domain-Specific Text

The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained o...

Full description

Bibliographic Details
Main Authors: He Liu, Yuekun Ma, Chang Gao, Jia Qi, Dezheng Zhang
Format: Article
Language:English
Published: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2023-01-01
Series:Tehnički Vjesnik
Subjects:
Online Access:https://hrcak.srce.hr/file/446400
_version_ 1827281766453346304
author He Liu
Yuekun Ma
Chang Gao
Jia Qi
Dezheng Zhang
author_facet He Liu
Yuekun Ma
Chang Gao
Jia Qi
Dezheng Zhang
author_sort He Liu
collection DOAJ
description The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods.
first_indexed 2024-04-24T09:05:41Z
format Article
id doaj.art-fd5c675e5056442c8bd74b3e9307e0bc
institution Directory Open Access Journal
issn 1330-3651
1848-6339
language English
last_indexed 2024-04-24T09:05:41Z
publishDate 2023-01-01
publisher Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
record_format Article
series Tehnički Vjesnik
spelling doaj.art-fd5c675e5056442c8bd74b3e9307e0bc2024-04-15T19:00:38ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392023-01-013061799180810.17559/TV-20230324000477Chinese Named Entity Recognition Method for Domain-Specific TextHe Liu0Yuekun Ma1Chang Gao2Jia Qi3Dezheng Zhang4College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China1) College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China 2) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 3) Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, ChinaCollege for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, ChinaInspur Electronic Information Industry Co., Ltd.School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaThe Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods.https://hrcak.srce.hr/file/446400attention mechanismdata augmentationdomain textmeta-learningnamed entity recognition
spellingShingle He Liu
Yuekun Ma
Chang Gao
Jia Qi
Dezheng Zhang
Chinese Named Entity Recognition Method for Domain-Specific Text
Tehnički Vjesnik
attention mechanism
data augmentation
domain text
meta-learning
named entity recognition
title Chinese Named Entity Recognition Method for Domain-Specific Text
title_full Chinese Named Entity Recognition Method for Domain-Specific Text
title_fullStr Chinese Named Entity Recognition Method for Domain-Specific Text
title_full_unstemmed Chinese Named Entity Recognition Method for Domain-Specific Text
title_short Chinese Named Entity Recognition Method for Domain-Specific Text
title_sort chinese named entity recognition method for domain specific text
topic attention mechanism
data augmentation
domain text
meta-learning
named entity recognition
url https://hrcak.srce.hr/file/446400
work_keys_str_mv AT heliu chinesenamedentityrecognitionmethodfordomainspecifictext
AT yuekunma chinesenamedentityrecognitionmethodfordomainspecifictext
AT changgao chinesenamedentityrecognitionmethodfordomainspecifictext
AT jiaqi chinesenamedentityrecognitionmethodfordomainspecifictext
AT dezhengzhang chinesenamedentityrecognitionmethodfordomainspecifictext