Machine Learning for Arabic Text Classification: A Comparative Study

The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving inform...

Full description

Bibliographic Details
Main Authors: Djelloul Bouchiha, Abdelghani Bouziane, Noureddine Doumi
Format: Article
Language:English
Published: Penteract Technology 2022-10-01
Series:Malaysian Journal of Science and Advanced Technology
Subjects:
Online Access:https://mjsat.com.my/index.php/mjsat/article/view/83
_version_ 1797736998382862336
author Djelloul Bouchiha
Abdelghani Bouziane
Noureddine Doumi
author_facet Djelloul Bouchiha
Abdelghani Bouziane
Noureddine Doumi
author_sort Djelloul Bouchiha
collection DOAJ
description The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.
first_indexed 2024-03-12T13:22:02Z
format Article
id doaj.art-167e0bba49a54618b61703218d99df76
institution Directory Open Access Journal
issn 2785-8901
language English
last_indexed 2024-03-12T13:22:02Z
publishDate 2022-10-01
publisher Penteract Technology
record_format Article
series Malaysian Journal of Science and Advanced Technology
spelling doaj.art-167e0bba49a54618b61703218d99df762023-08-25T14:41:25ZengPenteract TechnologyMalaysian Journal of Science and Advanced Technology2785-89012022-10-012410.56532/mjsat.v2i4.8383Machine Learning for Arabic Text Classification: A Comparative StudyDjelloul Bouchiha0Abdelghani Bouziane1Noureddine Doumi2Ctr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaCtr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaUniversity of Saida, Faculty of Technologies, Department of Computer Science, Algeria The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier. https://mjsat.com.my/index.php/mjsat/article/view/83Arabic text classificationNatural language processingFeature extractionMachine Learning
spellingShingle Djelloul Bouchiha
Abdelghani Bouziane
Noureddine Doumi
Machine Learning for Arabic Text Classification: A Comparative Study
Malaysian Journal of Science and Advanced Technology
Arabic text classification
Natural language processing
Feature extraction
Machine Learning
title Machine Learning for Arabic Text Classification: A Comparative Study
title_full Machine Learning for Arabic Text Classification: A Comparative Study
title_fullStr Machine Learning for Arabic Text Classification: A Comparative Study
title_full_unstemmed Machine Learning for Arabic Text Classification: A Comparative Study
title_short Machine Learning for Arabic Text Classification: A Comparative Study
title_sort machine learning for arabic text classification a comparative study
topic Arabic text classification
Natural language processing
Feature extraction
Machine Learning
url https://mjsat.com.my/index.php/mjsat/article/view/83
work_keys_str_mv AT djelloulbouchiha machinelearningforarabictextclassificationacomparativestudy
AT abdelghanibouziane machinelearningforarabictextclassificationacomparativestudy
AT noureddinedoumi machinelearningforarabictextclassificationacomparativestudy