EVALUATING OF EFFICACY SEMANTIC SIMILARITY METHODS FOR COMPARISON OF ACADEMIC THESIS AND DISSERTATION TEXTS

Detecting semantic similarity between documents is vital in natural language processing applications. One widely used method for measuring the semantic similarity of text documents is embedding, which involves converting texts into numerical vectors using various NLP methods. This paper presents a...

Full description

Bibliographic Details
Main Authors: Ramadan T. Hassan, Nawzat S. Ahmed
Format: Article
Language:English
Published: University of Zakho 2023-08-01
Series:Science Journal of University of Zakho
Subjects:
Online Access:http://www.sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/1120
Description
Summary:Detecting semantic similarity between documents is vital in natural language processing applications. One widely used method for measuring the semantic similarity of text documents is embedding, which involves converting texts into numerical vectors using various NLP methods. This paper presents a comparative analysis of four embedding methods for detecting semantic similarity in theses and dissertations , namely Term Frequency–Inverse Document Frequency, Document to Vector, Sentence Bidirectional Encoder Representations from Transformers, and Bidirectional Encoder Representations from Transformers with cosine similarity. The study used two datasets consisting of 27 documents from Duhok Polytechnic University and 100 documents from ProQuest.com. The texts from these documents were pre-processed to make them suitable for semantic similarity analysis. The evaluation of the methods was based on several metrics, including accuracy, precision, Recall, F1 score, and processing time. The results showed that the traditional method, TF-IDF, outperformed modern methods in embedding and detecting actual semantic similarity between documents, with processing time not exceeding a few seconds.
ISSN:2663-628X
2663-6298