Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this pape...

Full description

Bibliographic Details
Main Authors: Atish Pawar, Vijay Mago
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8630924/
_version_ 1818641510659260416
author Atish Pawar
Vijay Mago
author_facet Atish Pawar
Vijay Mago
author_sort Atish Pawar
collection DOAJ
description The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.
first_indexed 2024-12-16T23:28:19Z
format Article
id doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T23:28:19Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f2022-12-21T22:11:56ZengIEEEIEEE Access2169-35362019-01-017162911630810.1109/ACCESS.2019.28916928630924Challenging the Boundaries of Unsupervised Learning for Semantic SimilarityAtish Pawar0https://orcid.org/0000-0003-4857-4057Vijay Mago1Department of Computer Science, Lakehead University, Thunder Bay, CanadaDepartment of Computer Science, Lakehead University, Thunder Bay, CanadaThe semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.https://ieeexplore.ieee.org/document/8630924/Corpuslexical databasenatural language processingsemantic analysissentence similarityword similarity
spellingShingle Atish Pawar
Vijay Mago
Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
IEEE Access
Corpus
lexical database
natural language processing
semantic analysis
sentence similarity
word similarity
title Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_full Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_fullStr Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_full_unstemmed Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_short Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
title_sort challenging the boundaries of unsupervised learning for semantic similarity
topic Corpus
lexical database
natural language processing
semantic analysis
sentence similarity
word similarity
url https://ieeexplore.ieee.org/document/8630924/
work_keys_str_mv AT atishpawar challengingtheboundariesofunsupervisedlearningforsemanticsimilarity
AT vijaymago challengingtheboundariesofunsupervisedlearningforsemanticsimilarity