Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this pape...

Full description

Bibliographic Details
Main Authors:	Atish Pawar, Vijay Mago
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Corpus lexical database natural language processing semantic analysis sentence similarity word similarity
Online Access:	https://ieeexplore.ieee.org/document/8630924/

_version_	1818641510659260416
author	Atish Pawar Vijay Mago
author_facet	Atish Pawar Vijay Mago
author_sort	Atish Pawar
collection	DOAJ
description	The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.
first_indexed	2024-12-16T23:28:19Z
format	Article
id	doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T23:28:19Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f2022-12-21T22:11:56ZengIEEEIEEE Access2169-35362019-01-017162911630810.1109/ACCESS.2019.28916928630924Challenging the Boundaries of Unsupervised Learning for Semantic SimilarityAtish Pawar0https://orcid.org/0000-0003-4857-4057Vijay Mago1Department of Computer Science, Lakehead University, Thunder Bay, CanadaDepartment of Computer Science, Lakehead University, Thunder Bay, CanadaThe semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.https://ieeexplore.ieee.org/document/8630924/Corpuslexical databasenatural language processingsemantic analysissentence similarityword similarity
spellingShingle	Atish Pawar Vijay Mago Challenging the Boundaries of Unsupervised Learning for Semantic Similarity IEEE Access Corpus lexical database natural language processing semantic analysis sentence similarity word similarity
title	Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_full	Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_fullStr	Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_full_unstemmed	Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_short	Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_sort	challenging the boundaries of unsupervised learning for semantic similarity
topic	Corpus lexical database natural language processing semantic analysis sentence similarity word similarity
url	https://ieeexplore.ieee.org/document/8630924/
work_keys_str_mv	AT atishpawar challengingtheboundariesofunsupervisedlearningforsemanticsimilarity AT vijaymago challengingtheboundariesofunsupervisedlearningforsemanticsimilarity

Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

Similar Items