Intelligent Hybrid Feature Selection for Textual Sentiment Classification

Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC....

Full description

Bibliographic Details
Main Authors: Jawad Khan, Aftab Alam, Youngmoon Lee
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9564065/
_version_ 1819021097328181248
author Jawad Khan
Aftab Alam
Youngmoon Lee
author_facet Jawad Khan
Aftab Alam
Youngmoon Lee
author_sort Jawad Khan
collection DOAJ
description Sentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC. Researchers have also proposed feature extraction and selection techniques to reduce high dimensional feature space, but they fall short in extracting and selecting the most effective sentiment features for sentiment model learning. Effective feature extraction and selection are significant for the SA because they can boost the learning algorithm’s predictive performance while reducing the high-dimensional feature space. To address these concerns, we propose an Intelligent Hybrid Feature Selection for Sentiment Analysis (IHFSSA) based on ensemble learning methods. IHFSSA first identifies sentiment features in the review text utilizing Penn Treebank part-of-speech tagset and integrated Wide Coverage Sentiment Lexicons (WCSL). The sentiment features subset is then selected employing a fast and simple rank-based ensemble of multiple filters feature selection method. The selected sentiment features are further refined by applying a wrapper-based backward feature selection method. Finally, for textual sentiment classification, the well-known classification algorithms Support Vector Machine (SVM), Naive Bayes (NB), Generalized Linear Model (GLM) are trained in the ensemble model on the refined sentiment feature set. The in-depth evaluation using heterogeneous domain benchmark datasets demonstrates that IHFSSA outperforms existing SA techniques.
first_indexed 2024-12-21T04:01:41Z
format Article
id doaj.art-8c4a410a7e6249259717bb169ab46fd7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-21T04:01:41Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-8c4a410a7e6249259717bb169ab46fd72022-12-21T19:16:42ZengIEEEIEEE Access2169-35362021-01-01914059014060810.1109/ACCESS.2021.31189829564065Intelligent Hybrid Feature Selection for Textual Sentiment ClassificationJawad Khan0https://orcid.org/0000-0001-8263-7213Aftab Alam1https://orcid.org/0000-0001-9222-2468Youngmoon Lee2https://orcid.org/0000-0002-6393-2994Department of Robotics, Hanyang University, Ansan-si, South KoreaDivision of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, QatarDepartment of Robotics, Hanyang University, Ansan-si, South KoreaSentiment Analysis (SA) aims to extract useful information from online Unstructured User-Generated Contents (UUGC) and classify them into positive and negative classes. State-of-the-art techniques for SA suffer a high dimensional feature space because of noisy and irrelevant features from the UUGC. Researchers have also proposed feature extraction and selection techniques to reduce high dimensional feature space, but they fall short in extracting and selecting the most effective sentiment features for sentiment model learning. Effective feature extraction and selection are significant for the SA because they can boost the learning algorithm’s predictive performance while reducing the high-dimensional feature space. To address these concerns, we propose an Intelligent Hybrid Feature Selection for Sentiment Analysis (IHFSSA) based on ensemble learning methods. IHFSSA first identifies sentiment features in the review text utilizing Penn Treebank part-of-speech tagset and integrated Wide Coverage Sentiment Lexicons (WCSL). The sentiment features subset is then selected employing a fast and simple rank-based ensemble of multiple filters feature selection method. The selected sentiment features are further refined by applying a wrapper-based backward feature selection method. Finally, for textual sentiment classification, the well-known classification algorithms Support Vector Machine (SVM), Naive Bayes (NB), Generalized Linear Model (GLM) are trained in the ensemble model on the refined sentiment feature set. The in-depth evaluation using heterogeneous domain benchmark datasets demonstrates that IHFSSA outperforms existing SA techniques.https://ieeexplore.ieee.org/document/9564065/Sentiment classificationhybrid feature selectionensemble learninglinguistic semantic ruleswide coverage sentiment lexiconsnatural language processing
spellingShingle Jawad Khan
Aftab Alam
Youngmoon Lee
Intelligent Hybrid Feature Selection for Textual Sentiment Classification
IEEE Access
Sentiment classification
hybrid feature selection
ensemble learning
linguistic semantic rules
wide coverage sentiment lexicons
natural language processing
title Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_full Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_fullStr Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_full_unstemmed Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_short Intelligent Hybrid Feature Selection for Textual Sentiment Classification
title_sort intelligent hybrid feature selection for textual sentiment classification
topic Sentiment classification
hybrid feature selection
ensemble learning
linguistic semantic rules
wide coverage sentiment lexicons
natural language processing
url https://ieeexplore.ieee.org/document/9564065/
work_keys_str_mv AT jawadkhan intelligenthybridfeatureselectionfortextualsentimentclassification
AT aftabalam intelligenthybridfeatureselectionfortextualsentimentclassification
AT youngmoonlee intelligenthybridfeatureselectionfortextualsentimentclassification