Analytics of machine learning-based algorithms for text classification

Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and ma...

Full description

Bibliographic Details
Main Authors: Sayar Ul Hassan, Jameel Ahamed, Khaleel Ahmad
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2022-01-01
Series:Sustainable Operations and Computers
Online Access:http://www.sciencedirect.com/science/article/pii/S2666412722000101
_version_ 1828086141635526656
author Sayar Ul Hassan
Jameel Ahamed
Khaleel Ahmad
author_facet Sayar Ul Hassan
Jameel Ahamed
Khaleel Ahmad
author_sort Sayar Ul Hassan
collection DOAJ
description Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system.
first_indexed 2024-04-11T04:51:12Z
format Article
id doaj.art-8e82873bedc84fd2a61e5b3faf81730b
institution Directory Open Access Journal
issn 2666-4127
language English
last_indexed 2024-04-11T04:51:12Z
publishDate 2022-01-01
publisher KeAi Communications Co. Ltd.
record_format Article
series Sustainable Operations and Computers
spelling doaj.art-8e82873bedc84fd2a61e5b3faf81730b2022-12-27T04:37:47ZengKeAi Communications Co. Ltd.Sustainable Operations and Computers2666-41272022-01-013238248Analytics of machine learning-based algorithms for text classificationSayar Ul Hassan0Jameel Ahamed1Khaleel Ahmad2Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaDepartment of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India; Corresponding author.Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaText classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system.http://www.sciencedirect.com/science/article/pii/S2666412722000101
spellingShingle Sayar Ul Hassan
Jameel Ahamed
Khaleel Ahmad
Analytics of machine learning-based algorithms for text classification
Sustainable Operations and Computers
title Analytics of machine learning-based algorithms for text classification
title_full Analytics of machine learning-based algorithms for text classification
title_fullStr Analytics of machine learning-based algorithms for text classification
title_full_unstemmed Analytics of machine learning-based algorithms for text classification
title_short Analytics of machine learning-based algorithms for text classification
title_sort analytics of machine learning based algorithms for text classification
url http://www.sciencedirect.com/science/article/pii/S2666412722000101
work_keys_str_mv AT sayarulhassan analyticsofmachinelearningbasedalgorithmsfortextclassification
AT jameelahamed analyticsofmachinelearningbasedalgorithmsfortextclassification
AT khaleelahmad analyticsofmachinelearningbasedalgorithmsfortextclassification