TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis

In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervise...

Full description

Bibliographic Details
Main Authors: Arwa Alshehri, Abdulmohsen Algarni
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/7/1632
_version_ 1797608116376829952
author Arwa Alshehri
Abdulmohsen Algarni
author_facet Arwa Alshehri
Abdulmohsen Algarni
author_sort Arwa Alshehri
collection DOAJ
description In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called supervised term weighting (STW) schemes; however, most of them weigh the extracted features without considering the characteristics of some noisy features and data imbalances. Therefore, in this study, a novel STW approach was proposed, known as term frequency–term discrimination ability (TF-TDA). TF-TDA mainly presents the extracted features with different degrees of discrimination by categorizing them into several groups. Subsequently, each group is weighted based on its contribution. The proposed method was examined over four SA datasets using naive Bayes (NB) and support vector machine (SVM) models. The experimental results proved the superiority of TF-TDA over two baseline term weighting approaches, with improvements ranging from 0.52% to 3.99% in the F1 score. The statistical test results verified the significant improvement obtained by TF-TDA in most cases, where the <i>p</i>-value ranged from 0.0000597 to 0.0455.
first_indexed 2024-03-11T05:39:05Z
format Article
id doaj.art-aa9e50cd0ffb4e23991f6face983b4bb
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T05:39:05Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-aa9e50cd0ffb4e23991f6face983b4bb2023-11-17T16:33:20ZengMDPI AGElectronics2079-92922023-03-01127163210.3390/electronics12071632TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment AnalysisArwa Alshehri0Abdulmohsen Algarni1Department of Computer Science, College of Computer Science, King Khalid University, Abha 62529, Saudi ArabiaDepartment of Computer Science, College of Computer Science, King Khalid University, Abha 62529, Saudi ArabiaIn text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called supervised term weighting (STW) schemes; however, most of them weigh the extracted features without considering the characteristics of some noisy features and data imbalances. Therefore, in this study, a novel STW approach was proposed, known as term frequency–term discrimination ability (TF-TDA). TF-TDA mainly presents the extracted features with different degrees of discrimination by categorizing them into several groups. Subsequently, each group is weighted based on its contribution. The proposed method was examined over four SA datasets using naive Bayes (NB) and support vector machine (SVM) models. The experimental results proved the superiority of TF-TDA over two baseline term weighting approaches, with improvements ranging from 0.52% to 3.99% in the F1 score. The statistical test results verified the significant improvement obtained by TF-TDA in most cases, where the <i>p</i>-value ranged from 0.0000597 to 0.0455.https://www.mdpi.com/2079-9292/12/7/1632machine learningtext classificationsentiment analysisfeature extractionsupervised term weighting
spellingShingle Arwa Alshehri
Abdulmohsen Algarni
TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
Electronics
machine learning
text classification
sentiment analysis
feature extraction
supervised term weighting
title TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
title_full TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
title_fullStr TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
title_full_unstemmed TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
title_short TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
title_sort tf tda a novel supervised term weighting scheme for sentiment analysis
topic machine learning
text classification
sentiment analysis
feature extraction
supervised term weighting
url https://www.mdpi.com/2079-9292/12/7/1632
work_keys_str_mv AT arwaalshehri tftdaanovelsupervisedtermweightingschemeforsentimentanalysis
AT abdulmohsenalgarni tftdaanovelsupervisedtermweightingschemeforsentimentanalysis