A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain

Food safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the...

Full description

Bibliographic Details
Main Authors: Taiping Yuan, Xizhong Qin, Chunji Wei
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/5/2849
Description
Summary:Food safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the hazards caused by food safety problems. However, there is no publicly available named entity recognition dataset in the food safety domain. In contrast, the non-standardized Chinese short texts generated from user comments on the web contain rich implicit information that can help identify named entities in specific domains (e.g., food safety domain) where the corpus is scarce. Therefore, in this paper, named entities related to food safety are extracted from these unstandardized texts on the web. However, the existing Chinese named entity identification methods are mainly for standardized texts. Meanwhile, these unstandardized texts have the following problems: (1) their corpus size is small; (2) there are various new and wrong words and noise; (3) and they do not follow strict syntactic rules. These problems make the recognition of Chinese named entities for online texts more challenging. Therefore, this paper proposes the ERNIE-Adv-BiLSTM-Att-CRF model to improve the recognition of food safety domain entities in unstandardized texts. Specifically, adversarial training is added to the model training as a regularization method to alleviate the influence of noise on the model, while self-attention is added to the BiLSTM-CRF model to capture features that significant impact entity classification and improve the accuracy of entity classification. This paper conducts experiments on the public dataset Weibo NER and the self-built food domain dataset Food. The experimental results show that our model achieves a SOTA performance of 72.64% and a good performance of 69.68% for F1 values on the public and self-built datasets, respectively. The validity and reasonableness of our model are verified. In addition, the paper further analyses the impact of various components and settings on the model. The study has practical implications in the field of food safety.
ISSN:2076-3417