Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
Sentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packag...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10398176/ |
_version_ | 1827355319067475968 |
---|---|
author | Amin Mahmoudi Dariusz Jemielniak Leon Ciechanowski |
author_facet | Amin Mahmoudi Dariusz Jemielniak Leon Ciechanowski |
author_sort | Amin Mahmoudi |
collection | DOAJ |
description | Sentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packages in these languages, focusing on accuracy while also considering time complexity. Across experiments conducted on seven distinct datasets, a crucial revelation surfaces: the accuracy of these packages significantly varies depending on the dataset used. Among these, the ‘sentimentr’ package consistently performs well across diverse datasets. Generally, Python libraries showcase superior processing speed. However, it’s essential to note that while these packages adeptly classify sentences as positive or negative, capturing sentiment intensity proves challenging. Our findings highlight a prevalent trend of overfitting, where these packages excel on familiar datasets but struggle when faced with unfamiliar ones. |
first_indexed | 2024-03-08T04:09:03Z |
format | Article |
id | doaj.art-d7701f0090284e429dbf1c8aa8a2ff78 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T04:09:03Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-d7701f0090284e429dbf1c8aa8a2ff782024-02-09T00:02:57ZengIEEEIEEE Access2169-35362024-01-0112201692018010.1109/ACCESS.2024.335369210398176Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment AnalysisAmin Mahmoudi0https://orcid.org/0000-0002-8407-7034Dariusz Jemielniak1https://orcid.org/0000-0002-3745-7931Leon Ciechanowski2https://orcid.org/0000-0002-4569-7222Management in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandManagement in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandManagement in Networked and Digital Societies (MINDS) Department, Kozminski University, Warsaw, PolandSentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packages in these languages, focusing on accuracy while also considering time complexity. Across experiments conducted on seven distinct datasets, a crucial revelation surfaces: the accuracy of these packages significantly varies depending on the dataset used. Among these, the ‘sentimentr’ package consistently performs well across diverse datasets. Generally, Python libraries showcase superior processing speed. However, it’s essential to note that while these packages adeptly classify sentences as positive or negative, capturing sentiment intensity proves challenging. Our findings highlight a prevalent trend of overfitting, where these packages excel on familiar datasets but struggle when faced with unfamiliar ones.https://ieeexplore.ieee.org/document/10398176/Sentiment analysislexicon and rule basedsentiment analysis by Rsentiment analysis by pythonVADERsentimentr |
spellingShingle | Amin Mahmoudi Dariusz Jemielniak Leon Ciechanowski Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis IEEE Access Sentiment analysis lexicon and rule based sentiment analysis by R sentiment analysis by python VADER sentimentr |
title | Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis |
title_full | Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis |
title_fullStr | Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis |
title_full_unstemmed | Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis |
title_short | Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis |
title_sort | assessing accuracy a study of lexicon and rule based packages in r and python for sentiment analysis |
topic | Sentiment analysis lexicon and rule based sentiment analysis by R sentiment analysis by python VADER sentimentr |
url | https://ieeexplore.ieee.org/document/10398176/ |
work_keys_str_mv | AT aminmahmoudi assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis AT dariuszjemielniak assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis AT leonciechanowski assessingaccuracyastudyoflexiconandrulebasedpackagesinrandpythonforsentimentanalysis |