A novel hybrid methodology for computing semantic similarity between sentences through various word senses

In the area of natural language processing, measuring sentence similarity is an essential problem. Searching for semantic meaning in natural language is a related issue. The task of measuring sentence similarity is to find semantic symmetry in two sentences, not matter how they are arranged. It is i...

Full description

Bibliographic Details
Main Authors:	Farooq Ahmad, Dr. Mohammad Faisal
Format:	Article
Language:	English
Published:	KeAi Communications Co., Ltd. 2022-06-01
Series:	International Journal of Cognitive Computing in Engineering
Subjects:	Natural language processing WordNet Word embedding Word overlap Semantic search Semantic similarity
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666307422000055

_version_	1828070063527165952
author	Farooq Ahmad Dr. Mohammad Faisal
author_facet	Farooq Ahmad Dr. Mohammad Faisal
author_sort	Farooq Ahmad
collection	DOAJ
description	In the area of natural language processing, measuring sentence similarity is an essential problem. Searching for semantic meaning in natural language is a related issue. The task of measuring sentence similarity is to find semantic symmetry in two sentences, not matter how they are arranged. It is important to measure the similarity of sentences accurately. To compute the similarity between sentences, existing methods have been constructed from approaches for large texts. Since these methods work in very high-dimensional spaces, they are inefficient, require human input, and are not flexible enough for some applications. In this study, we propose a hybrid method (HydMethod) which considers not only semantic information including lexical databases, word embeddings, and corpus statistics, but also implied word order information. With lexical databases, our method models human common sense knowledge, and that knowledge can then be adapted to be used in different domains with the incorporation of corpus statistics. Therefore, the methodology is applicable across several domains. As part of our experiments, we used two standard datasets - Pilot Short Text Semantic Similarity Benchmark and MS paraphrase - in order to demonstrate the efficacy of our proposed method. As a result, the proposed method outperforms the existing approaches when tested on these two datasets, giving the highest correlation value for both word and sentence similarity. Moreover, it achieves a maximum of 32% higher increase than only using word vector or WorldNet based methodology. With Rubenstein and Goodenough word & sentence pairs, our algorithm's similarity measure shows a high Pearson correlation coefficient of 0.8953.
first_indexed	2024-04-11T00:29:46Z
format	Article
id	doaj.art-af29face02b74c19b71a81a475969cc9
institution	Directory Open Access Journal
issn	2666-3074
language	English
last_indexed	2024-04-11T00:29:46Z
publishDate	2022-06-01
publisher	KeAi Communications Co., Ltd.
record_format	Article
series	International Journal of Cognitive Computing in Engineering
spelling	doaj.art-af29face02b74c19b71a81a475969cc92023-01-08T04:14:59ZengKeAi Communications Co., Ltd.International Journal of Cognitive Computing in Engineering2666-30742022-06-0135877A novel hybrid methodology for computing semantic similarity between sentences through various word sensesFarooq Ahmad0Dr. Mohammad Faisal1Department of Computer Application, Integral University, Lucknow, India; Corresponding author.Department of Computer Application, Integral University, Lucknow, IndiaIn the area of natural language processing, measuring sentence similarity is an essential problem. Searching for semantic meaning in natural language is a related issue. The task of measuring sentence similarity is to find semantic symmetry in two sentences, not matter how they are arranged. It is important to measure the similarity of sentences accurately. To compute the similarity between sentences, existing methods have been constructed from approaches for large texts. Since these methods work in very high-dimensional spaces, they are inefficient, require human input, and are not flexible enough for some applications. In this study, we propose a hybrid method (HydMethod) which considers not only semantic information including lexical databases, word embeddings, and corpus statistics, but also implied word order information. With lexical databases, our method models human common sense knowledge, and that knowledge can then be adapted to be used in different domains with the incorporation of corpus statistics. Therefore, the methodology is applicable across several domains. As part of our experiments, we used two standard datasets - Pilot Short Text Semantic Similarity Benchmark and MS paraphrase - in order to demonstrate the efficacy of our proposed method. As a result, the proposed method outperforms the existing approaches when tested on these two datasets, giving the highest correlation value for both word and sentence similarity. Moreover, it achieves a maximum of 32% higher increase than only using word vector or WorldNet based methodology. With Rubenstein and Goodenough word & sentence pairs, our algorithm's similarity measure shows a high Pearson correlation coefficient of 0.8953.http://www.sciencedirect.com/science/article/pii/S2666307422000055Natural language processingWordNetWord embeddingWord overlapSemantic searchSemantic similarity
spellingShingle	Farooq Ahmad Dr. Mohammad Faisal A novel hybrid methodology for computing semantic similarity between sentences through various word senses International Journal of Cognitive Computing in Engineering Natural language processing WordNet Word embedding Word overlap Semantic search Semantic similarity
title	A novel hybrid methodology for computing semantic similarity between sentences through various word senses
title_full	A novel hybrid methodology for computing semantic similarity between sentences through various word senses
title_fullStr	A novel hybrid methodology for computing semantic similarity between sentences through various word senses
title_full_unstemmed	A novel hybrid methodology for computing semantic similarity between sentences through various word senses
title_short	A novel hybrid methodology for computing semantic similarity between sentences through various word senses
title_sort	novel hybrid methodology for computing semantic similarity between sentences through various word senses
topic	Natural language processing WordNet Word embedding Word overlap Semantic search Semantic similarity
url	http://www.sciencedirect.com/science/article/pii/S2666307422000055
work_keys_str_mv	AT farooqahmad anovelhybridmethodologyforcomputingsemanticsimilaritybetweensentencesthroughvariouswordsenses AT drmohammadfaisal anovelhybridmethodologyforcomputingsemanticsimilaritybetweensentencesthroughvariouswordsenses AT farooqahmad novelhybridmethodologyforcomputingsemanticsimilaritybetweensentencesthroughvariouswordsenses AT drmohammadfaisal novelhybridmethodologyforcomputingsemanticsimilaritybetweensentencesthroughvariouswordsenses

A novel hybrid methodology for computing semantic similarity between sentences through various word senses

Similar Items