Deep recurrent neural networks with word embeddings for Urdu named entity recognition

AbstractNamed entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep...

Full description

Bibliographic Details
Main Authors: Wahab Khan, Ali Daud, Fahd Alotaibi, Naif Aljohani, Sachi Arafat
Format: Article
Language:English
Published: Electronics and Telecommunications Research Institute (ETRI) 2019-07-01
Series:ETRI Journal
Subjects:
Online Access:https://doi.org/10.4218/etrij.2018-0553
Description
Summary:AbstractNamed entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state‐of‐the‐art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short‐term memory and back propagation through time approaches. The proposed models consider both language‐dependent features, such as part‐of‐speech tags, and language‐independent features, such as the “context windows” of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f‐measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.
ISSN:1225-6463