Chinese Named Entity Recognition Method for Domain-Specific Text
The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained o...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
2023-01-01
|
Series: | Tehnički Vjesnik |
Subjects: | |
Online Access: | https://hrcak.srce.hr/file/446400 |
_version_ | 1827281766453346304 |
---|---|
author | He Liu Yuekun Ma Chang Gao Jia Qi Dezheng Zhang |
author_facet | He Liu Yuekun Ma Chang Gao Jia Qi Dezheng Zhang |
author_sort | He Liu |
collection | DOAJ |
description | The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods. |
first_indexed | 2024-04-24T09:05:41Z |
format | Article |
id | doaj.art-fd5c675e5056442c8bd74b3e9307e0bc |
institution | Directory Open Access Journal |
issn | 1330-3651 1848-6339 |
language | English |
last_indexed | 2024-04-24T09:05:41Z |
publishDate | 2023-01-01 |
publisher | Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek |
record_format | Article |
series | Tehnički Vjesnik |
spelling | doaj.art-fd5c675e5056442c8bd74b3e9307e0bc2024-04-15T19:00:38ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392023-01-013061799180810.17559/TV-20230324000477Chinese Named Entity Recognition Method for Domain-Specific TextHe Liu0Yuekun Ma1Chang Gao2Jia Qi3Dezheng Zhang4College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China1) College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China 2) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 3) Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, ChinaCollege for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, ChinaInspur Electronic Information Industry Co., Ltd.School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaThe Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods.https://hrcak.srce.hr/file/446400attention mechanismdata augmentationdomain textmeta-learningnamed entity recognition |
spellingShingle | He Liu Yuekun Ma Chang Gao Jia Qi Dezheng Zhang Chinese Named Entity Recognition Method for Domain-Specific Text Tehnički Vjesnik attention mechanism data augmentation domain text meta-learning named entity recognition |
title | Chinese Named Entity Recognition Method for Domain-Specific Text |
title_full | Chinese Named Entity Recognition Method for Domain-Specific Text |
title_fullStr | Chinese Named Entity Recognition Method for Domain-Specific Text |
title_full_unstemmed | Chinese Named Entity Recognition Method for Domain-Specific Text |
title_short | Chinese Named Entity Recognition Method for Domain-Specific Text |
title_sort | chinese named entity recognition method for domain specific text |
topic | attention mechanism data augmentation domain text meta-learning named entity recognition |
url | https://hrcak.srce.hr/file/446400 |
work_keys_str_mv | AT heliu chinesenamedentityrecognitionmethodfordomainspecifictext AT yuekunma chinesenamedentityrecognitionmethodfordomainspecifictext AT changgao chinesenamedentityrecognitionmethodfordomainspecifictext AT jiaqi chinesenamedentityrecognitionmethodfordomainspecifictext AT dezhengzhang chinesenamedentityrecognitionmethodfordomainspecifictext |