Machine Learning for Arabic Text Classification: A Comparative Study

The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving inform...

Full description

Bibliographic Details
Main Authors:	Djelloul Bouchiha, Abdelghani Bouziane, Noureddine Doumi
Format:	Article
Language:	English
Published:	Penteract Technology 2022-10-01
Series:	Malaysian Journal of Science and Advanced Technology
Subjects:	Arabic text classification Natural language processing Feature extraction Machine Learning
Online Access:	https://mjsat.com.my/index.php/mjsat/article/view/83

_version_	1797736998382862336
author	Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi
author_facet	Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi
author_sort	Djelloul Bouchiha
collection	DOAJ
description	The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.
first_indexed	2024-03-12T13:22:02Z
format	Article
id	doaj.art-167e0bba49a54618b61703218d99df76
institution	Directory Open Access Journal
issn	2785-8901
language	English
last_indexed	2024-03-12T13:22:02Z
publishDate	2022-10-01
publisher	Penteract Technology
record_format	Article
series	Malaysian Journal of Science and Advanced Technology
spelling	doaj.art-167e0bba49a54618b61703218d99df762023-08-25T14:41:25ZengPenteract TechnologyMalaysian Journal of Science and Advanced Technology2785-89012022-10-012410.56532/mjsat.v2i4.8383Machine Learning for Arabic Text Classification: A Comparative StudyDjelloul Bouchiha0Abdelghani Bouziane1Noureddine Doumi2Ctr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaCtr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaUniversity of Saida, Faculty of Technologies, Department of Computer Science, Algeria The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier. https://mjsat.com.my/index.php/mjsat/article/view/83Arabic text classificationNatural language processingFeature extractionMachine Learning
spellingShingle	Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi Machine Learning for Arabic Text Classification: A Comparative Study Malaysian Journal of Science and Advanced Technology Arabic text classification Natural language processing Feature extraction Machine Learning
title	Machine Learning for Arabic Text Classification: A Comparative Study
title_full	Machine Learning for Arabic Text Classification: A Comparative Study
title_fullStr	Machine Learning for Arabic Text Classification: A Comparative Study
title_full_unstemmed	Machine Learning for Arabic Text Classification: A Comparative Study
title_short	Machine Learning for Arabic Text Classification: A Comparative Study
title_sort	machine learning for arabic text classification a comparative study
topic	Arabic text classification Natural language processing Feature extraction Machine Learning
url	https://mjsat.com.my/index.php/mjsat/article/view/83
work_keys_str_mv	AT djelloulbouchiha machinelearningforarabictextclassificationacomparativestudy AT abdelghanibouziane machinelearningforarabictextclassificationacomparativestudy AT noureddinedoumi machinelearningforarabictextclassificationacomparativestudy

Machine Learning for Arabic Text Classification: A Comparative Study

Similar Items