Analytics of machine learning-based algorithms for text classification
Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and ma...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co. Ltd.
2022-01-01
|
Series: | Sustainable Operations and Computers |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666412722000101 |
_version_ | 1828086141635526656 |
---|---|
author | Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad |
author_facet | Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad |
author_sort | Sayar Ul Hassan |
collection | DOAJ |
description | Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system. |
first_indexed | 2024-04-11T04:51:12Z |
format | Article |
id | doaj.art-8e82873bedc84fd2a61e5b3faf81730b |
institution | Directory Open Access Journal |
issn | 2666-4127 |
language | English |
last_indexed | 2024-04-11T04:51:12Z |
publishDate | 2022-01-01 |
publisher | KeAi Communications Co. Ltd. |
record_format | Article |
series | Sustainable Operations and Computers |
spelling | doaj.art-8e82873bedc84fd2a61e5b3faf81730b2022-12-27T04:37:47ZengKeAi Communications Co. Ltd.Sustainable Operations and Computers2666-41272022-01-013238248Analytics of machine learning-based algorithms for text classificationSayar Ul Hassan0Jameel Ahamed1Khaleel Ahmad2Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaDepartment of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India; Corresponding author.Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, IndiaText classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism. In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM dataset as per the results obtained from the proposed system.http://www.sciencedirect.com/science/article/pii/S2666412722000101 |
spellingShingle | Sayar Ul Hassan Jameel Ahamed Khaleel Ahmad Analytics of machine learning-based algorithms for text classification Sustainable Operations and Computers |
title | Analytics of machine learning-based algorithms for text classification |
title_full | Analytics of machine learning-based algorithms for text classification |
title_fullStr | Analytics of machine learning-based algorithms for text classification |
title_full_unstemmed | Analytics of machine learning-based algorithms for text classification |
title_short | Analytics of machine learning-based algorithms for text classification |
title_sort | analytics of machine learning based algorithms for text classification |
url | http://www.sciencedirect.com/science/article/pii/S2666412722000101 |
work_keys_str_mv | AT sayarulhassan analyticsofmachinelearningbasedalgorithmsfortextclassification AT jameelahamed analyticsofmachinelearningbasedalgorithmsfortextclassification AT khaleelahmad analyticsofmachinelearningbasedalgorithmsfortextclassification |