COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS

Identifying the sentiment of collected tweets has become a challenging and interesting task. In addition, mining and defining relevant features that can improve the quality of a classification system is crucial. The data modeling phase is fundamental for the whole process since it can reveal hidden...

Full description

Bibliographic Details
Main Author:	Sergiu LIMBOI
Format:	Article
Language:	English
Published:	Babes-Bolyai University, Cluj-Napoca 2023-05-01
Series:	Studia Universitatis Babes-Bolyai: Series Informatica
Subjects:	Sentiment Analysis, Twitter, Data Representation, Hashtags, Clustering
Online Access:	http://193.231.18.162/index.php/subbinformatica/article/view/5804

_version_	1827356560905469952
author	Sergiu LIMBOI
author_facet	Sergiu LIMBOI
author_sort	Sergiu LIMBOI
collection	DOAJ
description	Identifying the sentiment of collected tweets has become a challenging and interesting task. In addition, mining and defining relevant features that can improve the quality of a classification system is crucial. The data modeling phase is fundamental for the whole process since it can reveal hidden information from the textual inputs. Two models are defined in the presented paper, considering Twitter-specific concepts: a hashtagbased representation and a text-based one. These models will be compared and integrated into an unsupervised system that determines groups of tweets based on sentiment labels (positive and negative). Moreover, wordembedding techniques (TF-IDF and frequency vectors) are used to convert the representations into a numeric input needed for the clustering methods. The experimental results show good values for Silhouette and Davies-Bouldin measures in the unsupervised environment. A detailed investigation is presented considering several items (dataset, clustering method, data representation, or word embeddings) for checking the best setup for increasing the quality of detecting the sentiment from Twitter’s messages. The analysis and conclusions show that the first results can be considered for more complex experiments. Received by the editors: 4 April 2023. 2010 Mathematics Subject Classification. 68T30, 68T50. 1998 CR Categories and Descriptors. I.2.7 [Artificial Intelligence]: Natural Language Processing – Text analysis.
first_indexed	2024-03-08T05:11:34Z
format	Article
id	doaj.art-d5f2475e6d5f49768163fb98887d311b
institution	Directory Open Access Journal
issn	2065-9601
language	English
last_indexed	2024-03-08T05:11:34Z
publishDate	2023-05-01
publisher	Babes-Bolyai University, Cluj-Napoca
record_format	Article
series	Studia Universitatis Babes-Bolyai: Series Informatica
spelling	doaj.art-d5f2475e6d5f49768163fb98887d311b2024-02-07T10:03:29ZengBabes-Bolyai University, Cluj-NapocaStudia Universitatis Babes-Bolyai: Series Informatica2065-96012023-05-0167210.24193/subbi.2022.2.05COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSISSergiu LIMBOI0Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania. Email: sergiu.limboi@ubbcluj.ro. Identifying the sentiment of collected tweets has become a challenging and interesting task. In addition, mining and defining relevant features that can improve the quality of a classification system is crucial. The data modeling phase is fundamental for the whole process since it can reveal hidden information from the textual inputs. Two models are defined in the presented paper, considering Twitter-specific concepts: a hashtagbased representation and a text-based one. These models will be compared and integrated into an unsupervised system that determines groups of tweets based on sentiment labels (positive and negative). Moreover, wordembedding techniques (TF-IDF and frequency vectors) are used to convert the representations into a numeric input needed for the clustering methods. The experimental results show good values for Silhouette and Davies-Bouldin measures in the unsupervised environment. A detailed investigation is presented considering several items (dataset, clustering method, data representation, or word embeddings) for checking the best setup for increasing the quality of detecting the sentiment from Twitter’s messages. The analysis and conclusions show that the first results can be considered for more complex experiments. Received by the editors: 4 April 2023. 2010 Mathematics Subject Classification. 68T30, 68T50. 1998 CR Categories and Descriptors. I.2.7 [Artificial Intelligence]: Natural Language Processing – Text analysis. http://193.231.18.162/index.php/subbinformatica/article/view/5804Sentiment Analysis, Twitter, Data Representation, Hashtags, Clustering
spellingShingle	Sergiu LIMBOI COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS Studia Universitatis Babes-Bolyai: Series Informatica Sentiment Analysis, Twitter, Data Representation, Hashtags, Clustering
title	COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS
title_full	COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS
title_fullStr	COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS
title_full_unstemmed	COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS
title_short	COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS
title_sort	comparison of data models for unsupervised twitter sentiment analysis
topic	Sentiment Analysis, Twitter, Data Representation, Hashtags, Clustering
url	http://193.231.18.162/index.php/subbinformatica/article/view/5804
work_keys_str_mv	AT sergiulimboi comparisonofdatamodelsforunsupervisedtwittersentimentanalysis

COMPARISON OF DATA MODELS FOR UNSUPERVISED TWITTER SENTIMENT ANALYSIS

Similar Items