Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition

Named entity recognition (NER) is a fundamental task in many natural language processing (NLP) applications, such as text summarization and semantic information retrieval. Recently, deep neural networks (NNs) with the attention mechanism yield excellent performance in NER by taking advantage of char...

Full description

Bibliographic Details
Main Authors: Wazir Ali, Jay Kumar, Zenglin Xu, Rajesh Kumar, Yazhou Ren
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/19/9038
_version_ 1827680934726467584
author Wazir Ali
Jay Kumar
Zenglin Xu
Rajesh Kumar
Yazhou Ren
author_facet Wazir Ali
Jay Kumar
Zenglin Xu
Rajesh Kumar
Yazhou Ren
author_sort Wazir Ali
collection DOAJ
description Named entity recognition (NER) is a fundamental task in many natural language processing (NLP) applications, such as text summarization and semantic information retrieval. Recently, deep neural networks (NNs) with the attention mechanism yield excellent performance in NER by taking advantage of character-level and word-level representation learning. In this paper, we propose a deep context-aware bidirectional long short-term memory (CaBiLSTM) model for the Sindhi NER task. The model relies upon contextual representation learning (CRL), bidirectional encoder, self-attention, and sequential conditional random field (CRF). The CaBiLSTM model incorporates task-oriented CRL based on joint character-level and word-level representations. It takes character-level input to learn the character representations. Afterwards, the character representations are transformed into word features, and the bidirectional encoder learns the word representations. The output of the final encoder is fed into the self-attention through a hidden layer before decoding. Finally, we employ the CRF for the prediction of label sequences. The baselines and the proposed CaBiLSTM model are compared by exploiting pretrained Sindhi GloVe (SdGloVe), Sindhi fastText (SdfastText), task-oriented, and CRL-based word representations on the recently proposed SiNER dataset. Our proposed CaBiLSTM model achieved a high F1-score of 91.25% on the SiNER dataset with CRL without relying on additional handmade features, such as hand-crafted rules, gazetteers, or dictionaries.
first_indexed 2024-03-10T07:06:19Z
format Article
id doaj.art-00e6f5d30bcc45c8a92819b1c01680a9
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T07:06:19Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-00e6f5d30bcc45c8a92819b1c01680a92023-11-22T15:46:45ZengMDPI AGApplied Sciences2076-34172021-09-011119903810.3390/app11199038Context-Aware Bidirectional Neural Model for Sindhi Named Entity RecognitionWazir Ali0Jay Kumar1Zenglin Xu2Rajesh Kumar3Yazhou Ren4School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611713, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611713, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611713, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611713, ChinaSchool of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611713, ChinaNamed entity recognition (NER) is a fundamental task in many natural language processing (NLP) applications, such as text summarization and semantic information retrieval. Recently, deep neural networks (NNs) with the attention mechanism yield excellent performance in NER by taking advantage of character-level and word-level representation learning. In this paper, we propose a deep context-aware bidirectional long short-term memory (CaBiLSTM) model for the Sindhi NER task. The model relies upon contextual representation learning (CRL), bidirectional encoder, self-attention, and sequential conditional random field (CRF). The CaBiLSTM model incorporates task-oriented CRL based on joint character-level and word-level representations. It takes character-level input to learn the character representations. Afterwards, the character representations are transformed into word features, and the bidirectional encoder learns the word representations. The output of the final encoder is fed into the self-attention through a hidden layer before decoding. Finally, we employ the CRF for the prediction of label sequences. The baselines and the proposed CaBiLSTM model are compared by exploiting pretrained Sindhi GloVe (SdGloVe), Sindhi fastText (SdfastText), task-oriented, and CRL-based word representations on the recently proposed SiNER dataset. Our proposed CaBiLSTM model achieved a high F1-score of 91.25% on the SiNER dataset with CRL without relying on additional handmade features, such as hand-crafted rules, gazetteers, or dictionaries.https://www.mdpi.com/2076-3417/11/19/9038Sindhi named entity recognitionrecurrent neural networksCaBiLSTMself-attention mechanismcontextual representation learning
spellingShingle Wazir Ali
Jay Kumar
Zenglin Xu
Rajesh Kumar
Yazhou Ren
Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
Applied Sciences
Sindhi named entity recognition
recurrent neural networks
CaBiLSTM
self-attention mechanism
contextual representation learning
title Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
title_full Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
title_fullStr Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
title_full_unstemmed Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
title_short Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition
title_sort context aware bidirectional neural model for sindhi named entity recognition
topic Sindhi named entity recognition
recurrent neural networks
CaBiLSTM
self-attention mechanism
contextual representation learning
url https://www.mdpi.com/2076-3417/11/19/9038
work_keys_str_mv AT wazirali contextawarebidirectionalneuralmodelforsindhinamedentityrecognition
AT jaykumar contextawarebidirectionalneuralmodelforsindhinamedentityrecognition
AT zenglinxu contextawarebidirectionalneuralmodelforsindhinamedentityrecognition
AT rajeshkumar contextawarebidirectionalneuralmodelforsindhinamedentityrecognition
AT yazhouren contextawarebidirectionalneuralmodelforsindhinamedentityrecognition