Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this pape...

Full description

Bibliographic Details
Main Authors:	Atish Pawar, Vijay Mago
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Corpus lexical database natural language processing semantic analysis sentence similarity word similarity
Online Access:	https://ieeexplore.ieee.org/document/8630924/

Description
Summary:	The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.
ISSN:	2169-3536

Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

Similar Items