TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis
In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervise...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/7/1632 |
_version_ | 1797608116376829952 |
---|---|
author | Arwa Alshehri Abdulmohsen Algarni |
author_facet | Arwa Alshehri Abdulmohsen Algarni |
author_sort | Arwa Alshehri |
collection | DOAJ |
description | In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called supervised term weighting (STW) schemes; however, most of them weigh the extracted features without considering the characteristics of some noisy features and data imbalances. Therefore, in this study, a novel STW approach was proposed, known as term frequency–term discrimination ability (TF-TDA). TF-TDA mainly presents the extracted features with different degrees of discrimination by categorizing them into several groups. Subsequently, each group is weighted based on its contribution. The proposed method was examined over four SA datasets using naive Bayes (NB) and support vector machine (SVM) models. The experimental results proved the superiority of TF-TDA over two baseline term weighting approaches, with improvements ranging from 0.52% to 3.99% in the F1 score. The statistical test results verified the significant improvement obtained by TF-TDA in most cases, where the <i>p</i>-value ranged from 0.0000597 to 0.0455. |
first_indexed | 2024-03-11T05:39:05Z |
format | Article |
id | doaj.art-aa9e50cd0ffb4e23991f6face983b4bb |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T05:39:05Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-aa9e50cd0ffb4e23991f6face983b4bb2023-11-17T16:33:20ZengMDPI AGElectronics2079-92922023-03-01127163210.3390/electronics12071632TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment AnalysisArwa Alshehri0Abdulmohsen Algarni1Department of Computer Science, College of Computer Science, King Khalid University, Abha 62529, Saudi ArabiaDepartment of Computer Science, College of Computer Science, King Khalid University, Abha 62529, Saudi ArabiaIn text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called supervised term weighting (STW) schemes; however, most of them weigh the extracted features without considering the characteristics of some noisy features and data imbalances. Therefore, in this study, a novel STW approach was proposed, known as term frequency–term discrimination ability (TF-TDA). TF-TDA mainly presents the extracted features with different degrees of discrimination by categorizing them into several groups. Subsequently, each group is weighted based on its contribution. The proposed method was examined over four SA datasets using naive Bayes (NB) and support vector machine (SVM) models. The experimental results proved the superiority of TF-TDA over two baseline term weighting approaches, with improvements ranging from 0.52% to 3.99% in the F1 score. The statistical test results verified the significant improvement obtained by TF-TDA in most cases, where the <i>p</i>-value ranged from 0.0000597 to 0.0455.https://www.mdpi.com/2079-9292/12/7/1632machine learningtext classificationsentiment analysisfeature extractionsupervised term weighting |
spellingShingle | Arwa Alshehri Abdulmohsen Algarni TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis Electronics machine learning text classification sentiment analysis feature extraction supervised term weighting |
title | TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis |
title_full | TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis |
title_fullStr | TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis |
title_full_unstemmed | TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis |
title_short | TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis |
title_sort | tf tda a novel supervised term weighting scheme for sentiment analysis |
topic | machine learning text classification sentiment analysis feature extraction supervised term weighting |
url | https://www.mdpi.com/2079-9292/12/7/1632 |
work_keys_str_mv | AT arwaalshehri tftdaanovelsupervisedtermweightingschemeforsentimentanalysis AT abdulmohsenalgarni tftdaanovelsupervisedtermweightingschemeforsentimentanalysis |