A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain

Food safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the...

Full description

Bibliographic Details
Main Authors: Taiping Yuan, Xizhong Qin, Chunji Wei
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/5/2849
_version_ 1797615822698446848
author Taiping Yuan
Xizhong Qin
Chunji Wei
author_facet Taiping Yuan
Xizhong Qin
Chunji Wei
author_sort Taiping Yuan
collection DOAJ
description Food safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the hazards caused by food safety problems. However, there is no publicly available named entity recognition dataset in the food safety domain. In contrast, the non-standardized Chinese short texts generated from user comments on the web contain rich implicit information that can help identify named entities in specific domains (e.g., food safety domain) where the corpus is scarce. Therefore, in this paper, named entities related to food safety are extracted from these unstandardized texts on the web. However, the existing Chinese named entity identification methods are mainly for standardized texts. Meanwhile, these unstandardized texts have the following problems: (1) their corpus size is small; (2) there are various new and wrong words and noise; (3) and they do not follow strict syntactic rules. These problems make the recognition of Chinese named entities for online texts more challenging. Therefore, this paper proposes the ERNIE-Adv-BiLSTM-Att-CRF model to improve the recognition of food safety domain entities in unstandardized texts. Specifically, adversarial training is added to the model training as a regularization method to alleviate the influence of noise on the model, while self-attention is added to the BiLSTM-CRF model to capture features that significant impact entity classification and improve the accuracy of entity classification. This paper conducts experiments on the public dataset Weibo NER and the self-built food domain dataset Food. The experimental results show that our model achieves a SOTA performance of 72.64% and a good performance of 69.68% for F1 values on the public and self-built datasets, respectively. The validity and reasonableness of our model are verified. In addition, the paper further analyses the impact of various components and settings on the model. The study has practical implications in the field of food safety.
first_indexed 2024-03-11T07:32:21Z
format Article
id doaj.art-e6cb96118353400ebac2d49f6ae8a709
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T07:32:21Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-e6cb96118353400ebac2d49f6ae8a7092023-11-17T07:15:54ZengMDPI AGApplied Sciences2076-34172023-02-01135284910.3390/app13052849A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety DomainTaiping Yuan0Xizhong Qin1Chunji Wei2College of Information Science and Engineering, Xinjiang University, Urumqi 830049, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830049, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830049, ChinaFood safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the hazards caused by food safety problems. However, there is no publicly available named entity recognition dataset in the food safety domain. In contrast, the non-standardized Chinese short texts generated from user comments on the web contain rich implicit information that can help identify named entities in specific domains (e.g., food safety domain) where the corpus is scarce. Therefore, in this paper, named entities related to food safety are extracted from these unstandardized texts on the web. However, the existing Chinese named entity identification methods are mainly for standardized texts. Meanwhile, these unstandardized texts have the following problems: (1) their corpus size is small; (2) there are various new and wrong words and noise; (3) and they do not follow strict syntactic rules. These problems make the recognition of Chinese named entities for online texts more challenging. Therefore, this paper proposes the ERNIE-Adv-BiLSTM-Att-CRF model to improve the recognition of food safety domain entities in unstandardized texts. Specifically, adversarial training is added to the model training as a regularization method to alleviate the influence of noise on the model, while self-attention is added to the BiLSTM-CRF model to capture features that significant impact entity classification and improve the accuracy of entity classification. This paper conducts experiments on the public dataset Weibo NER and the self-built food domain dataset Food. The experimental results show that our model achieves a SOTA performance of 72.64% and a good performance of 69.68% for F1 values on the public and self-built datasets, respectively. The validity and reasonableness of our model are verified. In addition, the paper further analyses the impact of various components and settings on the model. The study has practical implications in the field of food safety.https://www.mdpi.com/2076-3417/13/5/2849food safety supervisionnamed entity recognitionpre-trained language modelERNIEadversarial trainingBiLSTM-CRF
spellingShingle Taiping Yuan
Xizhong Qin
Chunji Wei
A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
Applied Sciences
food safety supervision
named entity recognition
pre-trained language model
ERNIE
adversarial training
BiLSTM-CRF
title A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
title_full A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
title_fullStr A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
title_full_unstemmed A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
title_short A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
title_sort chinese named entity recognition method based on ernie bilstm crf for food safety domain
topic food safety supervision
named entity recognition
pre-trained language model
ERNIE
adversarial training
BiLSTM-CRF
url https://www.mdpi.com/2076-3417/13/5/2849
work_keys_str_mv AT taipingyuan achinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain
AT xizhongqin achinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain
AT chunjiwei achinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain
AT taipingyuan chinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain
AT xizhongqin chinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain
AT chunjiwei chinesenamedentityrecognitionmethodbasedonerniebilstmcrfforfoodsafetydomain