A Superior Arabic Text Categorization Deep Model (SATCDM)

Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifyi...

Full description

Bibliographic Details
Main Authors:	M. Alhawarat, Ahmad O. Aseeri
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Documents classification deep learning Arabic language convolutional neural networks word embedding skip-gram
Online Access:	https://ieeexplore.ieee.org/document/8976160/

_version_	1818427837563011072
author	M. Alhawarat Ahmad O. Aseeri
author_facet	M. Alhawarat Ahmad O. Aseeri
author_sort	M. Alhawarat
collection	DOAJ
description	Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.
first_indexed	2024-12-14T14:52:04Z
format	Article
id	doaj.art-1d2a0f4d259647d791f0f18591cb9b94
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T14:52:04Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-1d2a0f4d259647d791f0f18591cb9b942022-12-21T22:57:06ZengIEEEIEEE Access2169-35362020-01-018246532466110.1109/ACCESS.2020.29705048976160A Superior Arabic Text Categorization Deep Model (SATCDM)M. Alhawarat0https://orcid.org/0000-0002-2608-6573Ahmad O. Aseeri1https://orcid.org/0000-0002-4234-4069Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaCategorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.https://ieeexplore.ieee.org/document/8976160/Documents classificationdeep learningArabic languageconvolutional neural networksword embeddingskip-gram
spellingShingle	M. Alhawarat Ahmad O. Aseeri A Superior Arabic Text Categorization Deep Model (SATCDM) IEEE Access Documents classification deep learning Arabic language convolutional neural networks word embedding skip-gram
title	A Superior Arabic Text Categorization Deep Model (SATCDM)
title_full	A Superior Arabic Text Categorization Deep Model (SATCDM)
title_fullStr	A Superior Arabic Text Categorization Deep Model (SATCDM)
title_full_unstemmed	A Superior Arabic Text Categorization Deep Model (SATCDM)
title_short	A Superior Arabic Text Categorization Deep Model (SATCDM)
title_sort	superior arabic text categorization deep model satcdm
topic	Documents classification deep learning Arabic language convolutional neural networks word embedding skip-gram
url	https://ieeexplore.ieee.org/document/8976160/
work_keys_str_mv	AT malhawarat asuperiorarabictextcategorizationdeepmodelsatcdm AT ahmadoaseeri asuperiorarabictextcategorizationdeepmodelsatcdm AT malhawarat superiorarabictextcategorizationdeepmodelsatcdm AT ahmadoaseeri superiorarabictextcategorizationdeepmodelsatcdm

A Superior Arabic Text Categorization Deep Model (SATCDM)

Similar Items