Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis

Sentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packag...

Full description

Bibliographic Details
Main Authors: Amin Mahmoudi, Dariusz Jemielniak, Leon Ciechanowski
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10398176/
_version_ 1827355319067475968
author Amin Mahmoudi
Dariusz Jemielniak
Leon Ciechanowski
author_facet Amin Mahmoudi
Dariusz Jemielniak
Leon Ciechanowski
author_sort Amin Mahmoudi
collection DOAJ
description Sentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packages in these languages, focusing on accuracy while also considering time complexity. Across experiments conducted on seven distinct datasets, a crucial revelation surfaces: the accuracy of these packages significantly varies depending on the dataset used. Among these, the ‘sentimentr’ package consistently performs well across diverse datasets. Generally, Python libraries showcase superior processing speed. However, it’s essential to note that while these packages adeptly classify sentences as positive or negative, capturing sentiment intensity proves challenging. Our findings highlight a prevalent trend of overfitting, where these packages excel on familiar datasets but struggle when faced with unfamiliar ones.
first_indexed 2024-03-08T04:09:03Z
format Article
id doaj.art-d7701f0090284e429dbf1c8aa8a2ff78
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T04:09:03Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-d7701f0090284e429dbf1c8aa8a2ff782024-02-09T00:02:57ZengIEEEIEEE Access2169-35362024-01-0112201692018010.1109/ACCESS.2024.335369210398176Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment AnalysisAmin Mahmoudi0https://orcid.org/0000-0002-8407-7034Dariusz Jemielniak1https://orcid.org/0000-0002-3745-7931Leon Ciechanowski2https://orcid.org/0000-0002-4569-7222Management in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandManagement in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandManagement in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandSentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packages in these languages, focusing on accuracy while also considering time complexity. Across experiments conducted on seven distinct datasets, a crucial revelation surfaces: the accuracy of these packages significantly varies depending on the dataset used. Among these, the ‘sentimentr’ package consistently performs well across diverse datasets. Generally, Python libraries showcase superior processing speed. However, it’s essential to note that while these packages adeptly classify sentences as positive or negative, capturing sentiment intensity proves challenging. Our findings highlight a prevalent trend of overfitting, where these packages excel on familiar datasets but struggle when faced with unfamiliar ones.https://ieeexplore.ieee.org/document/10398176/Sentiment analysislexicon and rule basedsentiment analysis by Rsentiment analysis by pythonVADERsentimentr
spellingShingle Amin Mahmoudi
Dariusz Jemielniak
Leon Ciechanowski
Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
IEEE Access
Sentiment analysis
lexicon and rule based
sentiment analysis by R
sentiment analysis by python
VADER
sentimentr
title Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
title_full Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
title_fullStr Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
title_full_unstemmed Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
title_short Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
title_sort assessing accuracy a study of lexicon and rule based packages in r and python for sentiment analysis
topic Sentiment analysis
lexicon and rule based
sentiment analysis by R
sentiment analysis by python
VADER
sentimentr
url https://ieeexplore.ieee.org/document/10398176/
work_keys_str_mv AT aminmahmoudi assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis
AT dariuszjemielniak assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis
AT leonciechanowski assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis