A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representat...

Full description

Bibliographic Details
Main Authors: Fatima-zahra El-Alami, Abdelkader El Mahdaouy, Said Ouatik El Alaoui, Noureddine En-Nahnahi
Format: Article
Language:English
Published: UUM Press 2020-06-01
Series:Journal of ICT
Subjects:
Online Access:https://e-journal.uum.edu.my/index.php/jict/article/view/12388
_version_ 1811314746005127168
author Fatima-zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
author_facet Fatima-zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
author_sort Fatima-zahra El-Alami
collection DOAJ
description Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
first_indexed 2024-04-13T11:17:51Z
format Article
id doaj.art-a4fbff76d74347bfa98fceb1bb4bcd9d
institution Directory Open Access Journal
issn 1675-414X
2180-3862
language English
last_indexed 2024-04-13T11:17:51Z
publishDate 2020-06-01
publisher UUM Press
record_format Article
series Journal of ICT
spelling doaj.art-a4fbff76d74347bfa98fceb1bb4bcd9d2022-12-22T02:48:55ZengUUM PressJournal of ICT1675-414X2180-38622020-06-01193A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATIONFatima-zahra El-Alami0Abdelkader El Mahdaouy1Said Ouatik El Alaoui2Noureddine En-Nahnahi3Laboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoLaboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoLaboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoNational School of Applied Sciences, Ibn Tofail University, Morocco Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. https://e-journal.uum.edu.my/index.php/jict/article/view/12388Arabic text representationdeep autoencoderfeature selectionmachine learningtext categorization
spellingShingle Fatima-zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
Journal of ICT
Arabic text representation
deep autoencoder
feature selection
machine learning
text categorization
title A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_full A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_fullStr A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_full_unstemmed A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_short A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_sort deep autoencoder based representation for arabic text categorization
topic Arabic text representation
deep autoencoder
feature selection
machine learning
text categorization
url https://e-journal.uum.edu.my/index.php/jict/article/view/12388
work_keys_str_mv AT fatimazahraelalami adeepautoencoderbasedrepresentationforarabictextcategorization
AT abdelkaderelmahdaouy adeepautoencoderbasedrepresentationforarabictextcategorization
AT saidouatikelalaoui adeepautoencoderbasedrepresentationforarabictextcategorization
AT noureddineennahnahi adeepautoencoderbasedrepresentationforarabictextcategorization
AT fatimazahraelalami deepautoencoderbasedrepresentationforarabictextcategorization
AT abdelkaderelmahdaouy deepautoencoderbasedrepresentationforarabictextcategorization
AT saidouatikelalaoui deepautoencoderbasedrepresentationforarabictextcategorization
AT noureddineennahnahi deepautoencoderbasedrepresentationforarabictextcategorization