Tweets Classification on the Base of Sentiments for US Airline Companies

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. Th...

Full description

Bibliographic Details
Main Authors:	Furqan Rustam, Imran Ashraf, Arif Mehmood, Saleem Ullah, Gyu Sang Choi
Format:	Article
Language:	English
Published:	MDPI AG 2019-11-01
Series:	Entropy
Subjects:	text mining text classification sentiment analysis supervised machine learning ensemble classifier long short-term memory network
Online Access:	https://www.mdpi.com/1099-4300/21/11/1078

_version_	1828278212393697280
author	Furqan Rustam Imran Ashraf Arif Mehmood Saleem Ullah Gyu Sang Choi
author_facet	Furqan Rustam Imran Ashraf Arif Mehmood Saleem Ullah Gyu Sang Choi
author_sort	Furqan Rustam
collection	DOAJ
description	The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.
first_indexed	2024-04-13T07:27:18Z
format	Article
id	doaj.art-a9f1ceddb7884099a354c8a0bb72fb6d
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-04-13T07:27:18Z
publishDate	2019-11-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-a9f1ceddb7884099a354c8a0bb72fb6d2022-12-22T02:56:27ZengMDPI AGEntropy1099-43002019-11-012111107810.3390/e21111078e21111078Tweets Classification on the Base of Sentiments for US Airline CompaniesFurqan Rustam0Imran Ashraf1Arif Mehmood2Saleem Ullah3Gyu Sang Choi4Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab 64200, PakistanDepartment of Information & Communication Engineering, Yeungnam University, Gyeongbuk 38541, KoreaDepartment of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab 64200, PakistanDepartment of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab 64200, PakistanDepartment of Information & Communication Engineering, Yeungnam University, Gyeongbuk 38541, KoreaThe use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.https://www.mdpi.com/1099-4300/21/11/1078text miningtext classificationsentiment analysissupervised machine learningensemble classifierlong short-term memory network
spellingShingle	Furqan Rustam Imran Ashraf Arif Mehmood Saleem Ullah Gyu Sang Choi Tweets Classification on the Base of Sentiments for US Airline Companies Entropy text mining text classification sentiment analysis supervised machine learning ensemble classifier long short-term memory network
title	Tweets Classification on the Base of Sentiments for US Airline Companies
title_full	Tweets Classification on the Base of Sentiments for US Airline Companies
title_fullStr	Tweets Classification on the Base of Sentiments for US Airline Companies
title_full_unstemmed	Tweets Classification on the Base of Sentiments for US Airline Companies
title_short	Tweets Classification on the Base of Sentiments for US Airline Companies
title_sort	tweets classification on the base of sentiments for us airline companies
topic	text mining text classification sentiment analysis supervised machine learning ensemble classifier long short-term memory network
url	https://www.mdpi.com/1099-4300/21/11/1078
work_keys_str_mv	AT furqanrustam tweetsclassificationonthebaseofsentimentsforusairlinecompanies AT imranashraf tweetsclassificationonthebaseofsentimentsforusairlinecompanies AT arifmehmood tweetsclassificationonthebaseofsentimentsforusairlinecompanies AT saleemullah tweetsclassificationonthebaseofsentimentsforusairlinecompanies AT gyusangchoi tweetsclassificationonthebaseofsentimentsforusairlinecompanies

Tweets Classification on the Base of Sentiments for US Airline Companies

Similar Items