Challenging the Boundaries of Unsupervised Learning for Semantic Similarity
The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this pape...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8630924/ |
_version_ | 1818641510659260416 |
---|---|
author | Atish Pawar Vijay Mago |
author_facet | Atish Pawar Vijay Mago |
author_sort | Atish Pawar |
collection | DOAJ |
description | The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models. |
first_indexed | 2024-12-16T23:28:19Z |
format | Article |
id | doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T23:28:19Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-a3f7a8ed31b44092b65e5b7ec5685d0f2022-12-21T22:11:56ZengIEEEIEEE Access2169-35362019-01-017162911630810.1109/ACCESS.2019.28916928630924Challenging the Boundaries of Unsupervised Learning for Semantic SimilarityAtish Pawar0https://orcid.org/0000-0003-4857-4057Vijay Mago1Department of Computer Science, Lakehead University, Thunder Bay, CanadaDepartment of Computer Science, Lakehead University, Thunder Bay, CanadaThe semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (r = 0.8753) and sentence similarity (r = 0.8793) concerning Rubenstein and Goodenough standard and the SICK dataset (r = 0.83241) outperforming other unsupervised models.https://ieeexplore.ieee.org/document/8630924/Corpuslexical databasenatural language processingsemantic analysissentence similarityword similarity |
spellingShingle | Atish Pawar Vijay Mago Challenging the Boundaries of Unsupervised Learning for Semantic Similarity IEEE Access Corpus lexical database natural language processing semantic analysis sentence similarity word similarity |
title | Challenging the Boundaries of Unsupervised Learning for Semantic Similarity |
title_full | Challenging the Boundaries of Unsupervised Learning for Semantic Similarity |
title_fullStr | Challenging the Boundaries of Unsupervised Learning for Semantic Similarity |
title_full_unstemmed | Challenging the Boundaries of Unsupervised Learning for Semantic Similarity |
title_short | Challenging the Boundaries of Unsupervised Learning for Semantic Similarity |
title_sort | challenging the boundaries of unsupervised learning for semantic similarity |
topic | Corpus lexical database natural language processing semantic analysis sentence similarity word similarity |
url | https://ieeexplore.ieee.org/document/8630924/ |
work_keys_str_mv | AT atishpawar challengingtheboundariesofunsupervisedlearningforsemanticsimilarity AT vijaymago challengingtheboundariesofunsupervisedlearningforsemanticsimilarity |