A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representat...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
UUM Press
2020-06-01
|
Series: | Journal of ICT |
Subjects: | |
Online Access: | https://e-journal.uum.edu.my/index.php/jict/article/view/12388 |
_version_ | 1811314746005127168 |
---|---|
author | Fatima-zahra El-Alami Abdelkader El Mahdaouy Said Ouatik El Alaoui Noureddine En-Nahnahi |
author_facet | Fatima-zahra El-Alami Abdelkader El Mahdaouy Said Ouatik El Alaoui Noureddine En-Nahnahi |
author_sort | Fatima-zahra El-Alami |
collection | DOAJ |
description |
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
|
first_indexed | 2024-04-13T11:17:51Z |
format | Article |
id | doaj.art-a4fbff76d74347bfa98fceb1bb4bcd9d |
institution | Directory Open Access Journal |
issn | 1675-414X 2180-3862 |
language | English |
last_indexed | 2024-04-13T11:17:51Z |
publishDate | 2020-06-01 |
publisher | UUM Press |
record_format | Article |
series | Journal of ICT |
spelling | doaj.art-a4fbff76d74347bfa98fceb1bb4bcd9d2022-12-22T02:48:55ZengUUM PressJournal of ICT1675-414X2180-38622020-06-01193A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATIONFatima-zahra El-Alami0Abdelkader El Mahdaouy1Said Ouatik El Alaoui2Noureddine En-Nahnahi3Laboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoLaboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoLaboratory of Informatics and Modeling, FSDM, Sidi Mohamed Ben Abdellah University, MoroccoNational School of Applied Sciences, Ibn Tofail University, Morocco Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. https://e-journal.uum.edu.my/index.php/jict/article/view/12388Arabic text representationdeep autoencoderfeature selectionmachine learningtext categorization |
spellingShingle | Fatima-zahra El-Alami Abdelkader El Mahdaouy Said Ouatik El Alaoui Noureddine En-Nahnahi A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION Journal of ICT Arabic text representation deep autoencoder feature selection machine learning text categorization |
title | A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION |
title_full | A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION |
title_fullStr | A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION |
title_full_unstemmed | A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION |
title_short | A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION |
title_sort | deep autoencoder based representation for arabic text categorization |
topic | Arabic text representation deep autoencoder feature selection machine learning text categorization |
url | https://e-journal.uum.edu.my/index.php/jict/article/view/12388 |
work_keys_str_mv | AT fatimazahraelalami adeepautoencoderbasedrepresentationforarabictextcategorization AT abdelkaderelmahdaouy adeepautoencoderbasedrepresentationforarabictextcategorization AT saidouatikelalaoui adeepautoencoderbasedrepresentationforarabictextcategorization AT noureddineennahnahi adeepautoencoderbasedrepresentationforarabictextcategorization AT fatimazahraelalami deepautoencoderbasedrepresentationforarabictextcategorization AT abdelkaderelmahdaouy deepautoencoderbasedrepresentationforarabictextcategorization AT saidouatikelalaoui deepautoencoderbasedrepresentationforarabictextcategorization AT noureddineennahnahi deepautoencoderbasedrepresentationforarabictextcategorization |