MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts

Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a...

Full description

Bibliographic Details
Main Authors: Chengxi Yan, Qi Su, Jun Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9206017/
_version_ 1819158790609567744
author Chengxi Yan
Qi Su
Jun Wang
author_facet Chengxi Yan
Qi Su
Jun Wang
author_sort Chengxi Yan
collection DOAJ
description Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.
first_indexed 2024-12-22T16:30:16Z
format Article
id doaj.art-205745367bba4d39bf1c9be735248357
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T16:30:16Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-205745367bba4d39bf1c9be7352483572022-12-21T18:20:04ZengIEEEIEEE Access2169-35362020-01-01818162918163910.1109/ACCESS.2020.30265359206017MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical TextsChengxi Yan0https://orcid.org/0000-0003-1128-550XQi Su1Jun Wang2Department of Information Management, Peking University, Beijing, ChinaSchool of Foreign Languages, Peking University, Beijing, ChinaDepartment of Information Management, Peking University, Beijing, ChinaNamed Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.https://ieeexplore.ieee.org/document/9206017/Named entity recognitiongated neural networkChinese historical texts
spellingShingle Chengxi Yan
Qi Su
Jun Wang
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
IEEE Access
Named entity recognition
gated neural network
Chinese historical texts
title MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_full MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_fullStr MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_full_unstemmed MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_short MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_sort mogcn mixture of gated convolutional neural network for named entity recognition of chinese historical texts
topic Named entity recognition
gated neural network
Chinese historical texts
url https://ieeexplore.ieee.org/document/9206017/
work_keys_str_mv AT chengxiyan mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts
AT qisu mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts
AT junwang mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts