An innovative automatic indexing method for Arabic text

<p>Automatic indexing and text retrieval methods for languages have been studied for a long time. Automatic indexing is a process of extracting words from a document to classify the documents per subject and to enhance the information retrieval process. Compared to other languages, there is st...

Full description

Bibliographic Details
Main Authors: Ramzi A. Haraty, Sanaa Kaddoura, Sultan Al Jahdali, Nour K. Masri
Format: Article
Language:English
Published: Academy Publishing Center 2023-03-01
Series:Advances in Computing and Engineering
Subjects:
Online Access:http://apc.aast.edu/ojs/index.php/ACE/article/view/557
Description
Summary:<p>Automatic indexing and text retrieval methods for languages have been studied for a long time. Automatic indexing is a process of extracting words from a document to classify the documents per subject and to enhance the information retrieval process. Compared to other languages, there is still limited research conducted for automated Arabic text categorization. In this work, we present an innovative method to reinforce the accuracy of automatic indexing of Arabic texts by introducing and integrating a thesaurus. Our model extracts new relevant words by referring to the created thesaurus, which contains and identifies words, synonyms, and correlations. This thesaurus is built using a natural language toolkit, which contains a library that lists the synonyms of a particular word available in the WordNet library. The words that have the same meaning and frequently appear together are grouped under one umbrella using a JavaScript Object Notation dictionary, making it leisurely to identify the topic of the text. Our results exhibit notable improvement in accuracy and efficiency compared to previous works.</p>
ISSN:2735-5977
2735-5985