A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition
In this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2019-01-01
|
Series: | Journal of Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1515/jisys-2016-0074 |
_version_ | 1818716454959185920 |
---|---|
author | Patra Rakesh Saha Sujan Kumar |
author_facet | Patra Rakesh Saha Sujan Kumar |
author_sort | Patra Rakesh |
collection | DOAJ |
description | In this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on bigram statistics of the words. However, in the sequential labeling tasks such as named entity recognition (NER), longer context words also carry valuable information. To capture this longer context information, we propose a new word clustering algorithm, which uses parse information of the sentences and a nonfixed word window. This proposed clustering algorithm, named as variable window clustering, performs better than Brown clustering in our experiments. Additionally, to use two different clustering techniques simultaneously in a classifier, we propose a cluster merging technique that performs an output level merging of two sets of clusters. To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. A baseline NER system is developed using conditional random fields classifier, and then the clusters using individual techniques as well as the merged technique are incorporated to improve the classifier. Experimental results demonstrate that the cluster merging technique is quite promising. |
first_indexed | 2024-12-17T19:19:31Z |
format | Article |
id | doaj.art-c0e69949290b4511bb22962173495007 |
institution | Directory Open Access Journal |
issn | 0334-1860 2191-026X |
language | English |
last_indexed | 2024-12-17T19:19:31Z |
publishDate | 2019-01-01 |
publisher | De Gruyter |
record_format | Article |
series | Journal of Intelligent Systems |
spelling | doaj.art-c0e69949290b4511bb229621734950072022-12-21T21:35:38ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2019-01-01281153010.1515/jisys-2016-0074A Novel Word Clustering and Cluster Merging Technique for Named Entity RecognitionPatra Rakesh0Saha Sujan Kumar1Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, IndiaDepartment of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, IndiaIn this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on bigram statistics of the words. However, in the sequential labeling tasks such as named entity recognition (NER), longer context words also carry valuable information. To capture this longer context information, we propose a new word clustering algorithm, which uses parse information of the sentences and a nonfixed word window. This proposed clustering algorithm, named as variable window clustering, performs better than Brown clustering in our experiments. Additionally, to use two different clustering techniques simultaneously in a classifier, we propose a cluster merging technique that performs an output level merging of two sets of clusters. To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. A baseline NER system is developed using conditional random fields classifier, and then the clusters using individual techniques as well as the merged technique are incorporated to improve the classifier. Experimental results demonstrate that the cluster merging technique is quite promising.https://doi.org/10.1515/jisys-2016-0074word clusteringbrown clusteringhierarchical clusteringcluster mergingnamed entity recognition91c2068t5068t3062h30 |
spellingShingle | Patra Rakesh Saha Sujan Kumar A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition Journal of Intelligent Systems word clustering brown clustering hierarchical clustering cluster merging named entity recognition 91c20 68t50 68t30 62h30 |
title | A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition |
title_full | A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition |
title_fullStr | A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition |
title_full_unstemmed | A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition |
title_short | A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition |
title_sort | novel word clustering and cluster merging technique for named entity recognition |
topic | word clustering brown clustering hierarchical clustering cluster merging named entity recognition 91c20 68t50 68t30 62h30 |
url | https://doi.org/10.1515/jisys-2016-0074 |
work_keys_str_mv | AT patrarakesh anovelwordclusteringandclustermergingtechniquefornamedentityrecognition AT sahasujankumar anovelwordclusteringandclustermergingtechniquefornamedentityrecognition AT patrarakesh novelwordclusteringandclustermergingtechniquefornamedentityrecognition AT sahasujankumar novelwordclusteringandclustermergingtechniquefornamedentityrecognition |