A Lexical Resource-Constrained Topic Model for Word Relatedness

Word relatedness computation is an important supporting technology for many tasks in natural language processing. Traditionally, there have been two distinct strategies for word relatedness measurement: one utilizes corpus-based models, whereas the other leverages external lexical resources. However...

Full description

Bibliographic Details
Main Authors: Yongjing Yin, Jiali Zeng, Hongji Wang, Keqing Wu, Bin Luo, Jinsong Su
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8703742/
_version_ 1818649103545925632
author Yongjing Yin
Jiali Zeng
Hongji Wang
Keqing Wu
Bin Luo
Jinsong Su
author_facet Yongjing Yin
Jiali Zeng
Hongji Wang
Keqing Wu
Bin Luo
Jinsong Su
author_sort Yongjing Yin
collection DOAJ
description Word relatedness computation is an important supporting technology for many tasks in natural language processing. Traditionally, there have been two distinct strategies for word relatedness measurement: one utilizes corpus-based models, whereas the other leverages external lexical resources. However, either solution has its strengths and weaknesses. In this paper, we propose a lexical resource-constrained topic model to integrate the two complementary strategies effectively. Our model is an extension of probabilistic latent semantic analysis, which automatically learns word-level distributed representations forward relatedness measurement. Furthermore, we introduce generalized expectation maximization (GEM) algorithm for statistical estimation. The proposed model not merely inherit the advantage of conventional topic models in dimension reduction, but it also refines parameter estimation by using word pairs that are known to be related. The experimental results in different languages demonstrate the effectiveness of our model in topic extraction and word relatedness measurement.
first_indexed 2024-12-17T01:29:00Z
format Article
id doaj.art-a6b50515027141f2b10cfd759827db40
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T01:29:00Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-a6b50515027141f2b10cfd759827db402022-12-21T22:08:37ZengIEEEIEEE Access2169-35362019-01-017552615526810.1109/ACCESS.2019.29091048703742A Lexical Resource-Constrained Topic Model for Word RelatednessYongjing Yin0https://orcid.org/0000-0003-1138-4612Jiali Zeng1Hongji Wang2Keqing Wu3Bin Luo4Jinsong Su5Software School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaWord relatedness computation is an important supporting technology for many tasks in natural language processing. Traditionally, there have been two distinct strategies for word relatedness measurement: one utilizes corpus-based models, whereas the other leverages external lexical resources. However, either solution has its strengths and weaknesses. In this paper, we propose a lexical resource-constrained topic model to integrate the two complementary strategies effectively. Our model is an extension of probabilistic latent semantic analysis, which automatically learns word-level distributed representations forward relatedness measurement. Furthermore, we introduce generalized expectation maximization (GEM) algorithm for statistical estimation. The proposed model not merely inherit the advantage of conventional topic models in dimension reduction, but it also refines parameter estimation by using word pairs that are known to be related. The experimental results in different languages demonstrate the effectiveness of our model in topic extraction and word relatedness measurement.https://ieeexplore.ieee.org/document/8703742/Natural language processingunsupervised learning
spellingShingle Yongjing Yin
Jiali Zeng
Hongji Wang
Keqing Wu
Bin Luo
Jinsong Su
A Lexical Resource-Constrained Topic Model for Word Relatedness
IEEE Access
Natural language processing
unsupervised learning
title A Lexical Resource-Constrained Topic Model for Word Relatedness
title_full A Lexical Resource-Constrained Topic Model for Word Relatedness
title_fullStr A Lexical Resource-Constrained Topic Model for Word Relatedness
title_full_unstemmed A Lexical Resource-Constrained Topic Model for Word Relatedness
title_short A Lexical Resource-Constrained Topic Model for Word Relatedness
title_sort lexical resource constrained topic model for word relatedness
topic Natural language processing
unsupervised learning
url https://ieeexplore.ieee.org/document/8703742/
work_keys_str_mv AT yongjingyin alexicalresourceconstrainedtopicmodelforwordrelatedness
AT jializeng alexicalresourceconstrainedtopicmodelforwordrelatedness
AT hongjiwang alexicalresourceconstrainedtopicmodelforwordrelatedness
AT keqingwu alexicalresourceconstrainedtopicmodelforwordrelatedness
AT binluo alexicalresourceconstrainedtopicmodelforwordrelatedness
AT jinsongsu alexicalresourceconstrainedtopicmodelforwordrelatedness
AT yongjingyin lexicalresourceconstrainedtopicmodelforwordrelatedness
AT jializeng lexicalresourceconstrainedtopicmodelforwordrelatedness
AT hongjiwang lexicalresourceconstrainedtopicmodelforwordrelatedness
AT keqingwu lexicalresourceconstrainedtopicmodelforwordrelatedness
AT binluo lexicalresourceconstrainedtopicmodelforwordrelatedness
AT jinsongsu lexicalresourceconstrainedtopicmodelforwordrelatedness