Machine Learning for Arabic Text Classification: A Comparative Study
The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving inform...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penteract Technology
2022-10-01
|
Series: | Malaysian Journal of Science and Advanced Technology |
Subjects: | |
Online Access: | https://mjsat.com.my/index.php/mjsat/article/view/83 |
_version_ | 1797736998382862336 |
---|---|
author | Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi |
author_facet | Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi |
author_sort | Djelloul Bouchiha |
collection | DOAJ |
description |
The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.
|
first_indexed | 2024-03-12T13:22:02Z |
format | Article |
id | doaj.art-167e0bba49a54618b61703218d99df76 |
institution | Directory Open Access Journal |
issn | 2785-8901 |
language | English |
last_indexed | 2024-03-12T13:22:02Z |
publishDate | 2022-10-01 |
publisher | Penteract Technology |
record_format | Article |
series | Malaysian Journal of Science and Advanced Technology |
spelling | doaj.art-167e0bba49a54618b61703218d99df762023-08-25T14:41:25ZengPenteract TechnologyMalaysian Journal of Science and Advanced Technology2785-89012022-10-012410.56532/mjsat.v2i4.8383Machine Learning for Arabic Text Classification: A Comparative StudyDjelloul Bouchiha0Abdelghani Bouziane1Noureddine Doumi2Ctr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaCtr Univ Naama, Inst. Sciences and Technologies, Dept. Mathematics and Computer Science, EEDIS Lab., UDL-SBA, AlgeriaUniversity of Saida, Faculty of Technologies, Department of Computer Science, Algeria The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier. https://mjsat.com.my/index.php/mjsat/article/view/83Arabic text classificationNatural language processingFeature extractionMachine Learning |
spellingShingle | Djelloul Bouchiha Abdelghani Bouziane Noureddine Doumi Machine Learning for Arabic Text Classification: A Comparative Study Malaysian Journal of Science and Advanced Technology Arabic text classification Natural language processing Feature extraction Machine Learning |
title | Machine Learning for Arabic Text Classification: A Comparative Study |
title_full | Machine Learning for Arabic Text Classification: A Comparative Study |
title_fullStr | Machine Learning for Arabic Text Classification: A Comparative Study |
title_full_unstemmed | Machine Learning for Arabic Text Classification: A Comparative Study |
title_short | Machine Learning for Arabic Text Classification: A Comparative Study |
title_sort | machine learning for arabic text classification a comparative study |
topic | Arabic text classification Natural language processing Feature extraction Machine Learning |
url | https://mjsat.com.my/index.php/mjsat/article/view/83 |
work_keys_str_mv | AT djelloulbouchiha machinelearningforarabictextclassificationacomparativestudy AT abdelghanibouziane machinelearningforarabictextclassificationacomparativestudy AT noureddinedoumi machinelearningforarabictextclassificationacomparativestudy |