A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition

In this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on...

Full description

Bibliographic Details
Main Authors: Patra Rakesh, Saha Sujan Kumar
Format: Article
Language:English
Published: De Gruyter 2019-01-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2016-0074
_version_ 1818716454959185920
author Patra Rakesh
Saha Sujan Kumar
author_facet Patra Rakesh
Saha Sujan Kumar
author_sort Patra Rakesh
collection DOAJ
description In this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on bigram statistics of the words. However, in the sequential labeling tasks such as named entity recognition (NER), longer context words also carry valuable information. To capture this longer context information, we propose a new word clustering algorithm, which uses parse information of the sentences and a nonfixed word window. This proposed clustering algorithm, named as variable window clustering, performs better than Brown clustering in our experiments. Additionally, to use two different clustering techniques simultaneously in a classifier, we propose a cluster merging technique that performs an output level merging of two sets of clusters. To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. A baseline NER system is developed using conditional random fields classifier, and then the clusters using individual techniques as well as the merged technique are incorporated to improve the classifier. Experimental results demonstrate that the cluster merging technique is quite promising.
first_indexed 2024-12-17T19:19:31Z
format Article
id doaj.art-c0e69949290b4511bb22962173495007
institution Directory Open Access Journal
issn 0334-1860
2191-026X
language English
last_indexed 2024-12-17T19:19:31Z
publishDate 2019-01-01
publisher De Gruyter
record_format Article
series Journal of Intelligent Systems
spelling doaj.art-c0e69949290b4511bb229621734950072022-12-21T21:35:38ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2019-01-01281153010.1515/jisys-2016-0074A Novel Word Clustering and Cluster Merging Technique for Named Entity RecognitionPatra Rakesh0Saha Sujan Kumar1Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, IndiaDepartment of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, IndiaIn this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on bigram statistics of the words. However, in the sequential labeling tasks such as named entity recognition (NER), longer context words also carry valuable information. To capture this longer context information, we propose a new word clustering algorithm, which uses parse information of the sentences and a nonfixed word window. This proposed clustering algorithm, named as variable window clustering, performs better than Brown clustering in our experiments. Additionally, to use two different clustering techniques simultaneously in a classifier, we propose a cluster merging technique that performs an output level merging of two sets of clusters. To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. A baseline NER system is developed using conditional random fields classifier, and then the clusters using individual techniques as well as the merged technique are incorporated to improve the classifier. Experimental results demonstrate that the cluster merging technique is quite promising.https://doi.org/10.1515/jisys-2016-0074word clusteringbrown clusteringhierarchical clusteringcluster mergingnamed entity recognition91c2068t5068t3062h30
spellingShingle Patra Rakesh
Saha Sujan Kumar
A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
Journal of Intelligent Systems
word clustering
brown clustering
hierarchical clustering
cluster merging
named entity recognition
91c20
68t50
68t30
62h30
title A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
title_full A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
title_fullStr A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
title_full_unstemmed A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
title_short A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
title_sort novel word clustering and cluster merging technique for named entity recognition
topic word clustering
brown clustering
hierarchical clustering
cluster merging
named entity recognition
91c20
68t50
68t30
62h30
url https://doi.org/10.1515/jisys-2016-0074
work_keys_str_mv AT patrarakesh anovelwordclusteringandclustermergingtechniquefornamedentityrecognition
AT sahasujankumar anovelwordclusteringandclustermergingtechniquefornamedentityrecognition
AT patrarakesh novelwordclusteringandclustermergingtechniquefornamedentityrecognition
AT sahasujankumar novelwordclusteringandclustermergingtechniquefornamedentityrecognition