Analytics of machine learning-based algorithms for text classification

Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and ma...

Full description

Bibliographic Details
Main Authors:	Sayar Ul Hassan, Jameel Ahamed, Khaleel Ahmad
Format:	Article
Language:	English
Published:	KeAi Communications Co. Ltd. 2022-01-01
Series:	Sustainable Operations and Computers
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666412722000101

_version_	1828086141635526656
author	Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad
author_facet	Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad
author_sort	Sayar Ul Hassan
collection	DOAJ
description	Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system.
first_indexed	2024-04-11T04:51:12Z
format	Article
id	doaj.art-8e82873bedc84fd2a61e5b3faf81730b
institution	Directory Open Access Journal
issn	2666-4127
language	English
last_indexed	2024-04-11T04:51:12Z
publishDate	2022-01-01
publisher	KeAi Communications Co. Ltd.
record_format	Article
series	Sustainable Operations and Computers
spelling	doaj.art-8e82873bedc84fd2a61e5b3faf81730b2022-12-27T04:37:47ZengKeAi Communications Co. Ltd.Sustainable Operations and Computers2666-41272022-01-013238248Analytics of machine learning-based algorithms for text classificationSayar Ul Hassan0Jameel Ahamed1Khaleel Ahmad2Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaDepartment of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India; Corresponding author.Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaText classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system.http://www.sciencedirect.com/science/article/pii/S2666412722000101
spellingShingle	Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad Analytics of machine learning-based algorithms for text classification Sustainable Operations and Computers
title	Analytics of machine learning-based algorithms for text classification
title_full	Analytics of machine learning-based algorithms for text classification
title_fullStr	Analytics of machine learning-based algorithms for text classification
title_full_unstemmed	Analytics of machine learning-based algorithms for text classification
title_short	Analytics of machine learning-based algorithms for text classification
title_sort	analytics of machine learning based algorithms for text classification
url	http://www.sciencedirect.com/science/article/pii/S2666412722000101
work_keys_str_mv	AT sayarulhassan analyticsofmachinelearningbasedalgorithmsfortextclassification AT jameelahamed analyticsofmachinelearningbasedalgorithmsfortextclassification AT khaleelahmad analyticsofmachinelearningbasedalgorithmsfortextclassification

Analytics of machine learning-based algorithms for text classification

Similar Items