UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method

Semantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supe...

Full description

Bibliographic Details
Main Authors: Basma Hassan, Samir E. Abdelrahman, Reem Bahgat, Ibrahim Farag
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8746255/
_version_ 1798001998762606592
author Basma Hassan
Samir E. Abdelrahman
Reem Bahgat
Ibrahim Farag
author_facet Basma Hassan
Samir E. Abdelrahman
Reem Bahgat
Ibrahim Farag
author_sort Basma Hassan
collection DOAJ
description Semantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supervised approaches. Unsupervised STS approaches are characterized by the fact that they do not require learning data, but they still suffer from some limitations. Word alignment has been widely used in the state-of-the-art approaches. From this point, this paper has three contributions. First, a new synset-oriented word aligner is presented, which relies on a huge multilingual semantic network named BabelNet. Second, three unsupervised STS approaches are proposed: string kernel-based (SK), alignment-based (AL), and weighted alignment-based (WAL). Third, some limitations of the state-of-the-art approaches are tackled, and different similarity methods are demonstrated to be complementary with each other by proposing an unsupervised ensemble STS (UESTS) approach. The UESTS incorporates the merits of four similarity measures: proposed alignment-based, surface-based, corpus-based, and enhanced edit distance. The experimental results proved that the participation of the proposed aligner in STS is effective. Over all the evaluation data sets, the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result.
first_indexed 2024-04-11T11:45:11Z
format Article
id doaj.art-28e516f6122e464ab96d5b7759f86b13
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T11:45:11Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-28e516f6122e464ab96d5b7759f86b132022-12-22T04:25:35ZengIEEEIEEE Access2169-35362019-01-017854628548210.1109/ACCESS.2019.29250068746255UESTS: An Unsupervised Ensemble Semantic Textual Similarity MethodBasma Hassan0https://orcid.org/0000-0002-5040-2935Samir E. Abdelrahman1https://orcid.org/0000-0001-6396-3259Reem Bahgat2Ibrahim Farag3Computer Science Department, Faculty of Computers and Information, Fayoum University, Fayoum, EgyptComputer Science Department, Faculty of Computers and Information, Cairo University, Giza, EgyptComputer Science Department, Faculty of Computers and Information, Cairo University, Giza, EgyptComputer Science Department, Faculty of Computers and Information, Cairo University, Giza, EgyptSemantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supervised approaches. Unsupervised STS approaches are characterized by the fact that they do not require learning data, but they still suffer from some limitations. Word alignment has been widely used in the state-of-the-art approaches. From this point, this paper has three contributions. First, a new synset-oriented word aligner is presented, which relies on a huge multilingual semantic network named BabelNet. Second, three unsupervised STS approaches are proposed: string kernel-based (SK), alignment-based (AL), and weighted alignment-based (WAL). Third, some limitations of the state-of-the-art approaches are tackled, and different similarity methods are demonstrated to be complementary with each other by proposing an unsupervised ensemble STS (UESTS) approach. The UESTS incorporates the merits of four similarity measures: proposed alignment-based, surface-based, corpus-based, and enhanced edit distance. The experimental results proved that the participation of the proposed aligner in STS is effective. Over all the evaluation data sets, the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result.https://ieeexplore.ieee.org/document/8746255/Semantic textual similarityword alignmentstring kernelBabelNetSemEvaltext processing
spellingShingle Basma Hassan
Samir E. Abdelrahman
Reem Bahgat
Ibrahim Farag
UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
IEEE Access
Semantic textual similarity
word alignment
string kernel
BabelNet
SemEval
text processing
title UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
title_full UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
title_fullStr UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
title_full_unstemmed UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
title_short UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
title_sort uests an unsupervised ensemble semantic textual similarity method
topic Semantic textual similarity
word alignment
string kernel
BabelNet
SemEval
text processing
url https://ieeexplore.ieee.org/document/8746255/
work_keys_str_mv AT basmahassan uestsanunsupervisedensemblesemantictextualsimilaritymethod
AT samireabdelrahman uestsanunsupervisedensemblesemantictextualsimilaritymethod
AT reembahgat uestsanunsupervisedensemblesemantictextualsimilaritymethod
AT ibrahimfarag uestsanunsupervisedensemblesemantictextualsimilaritymethod