Identification of Review Helpfulness Using Novel Textual and Language-Context Features

With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is compl...

Full description

Bibliographic Details
Main Authors: Muhammad Shehrayar Khan, Atif Rizwan, Muhammad Shahzad Faisal, Tahir Ahmad, Muhammad Saleem Khan, Ghada Atteia
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/18/3260
_version_ 1797485194780868608
author Muhammad Shehrayar Khan
Atif Rizwan
Muhammad Shahzad Faisal
Tahir Ahmad
Muhammad Saleem Khan
Ghada Atteia
author_facet Muhammad Shehrayar Khan
Atif Rizwan
Muhammad Shahzad Faisal
Tahir Ahmad
Muhammad Saleem Khan
Ghada Atteia
author_sort Muhammad Shehrayar Khan
collection DOAJ
description With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.
first_indexed 2024-03-09T23:16:17Z
format Article
id doaj.art-10d7c63128bc4cadbd6f1e886282a53a
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T23:16:17Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-10d7c63128bc4cadbd6f1e886282a53a2023-11-23T17:35:31ZengMDPI AGMathematics2227-73902022-09-011018326010.3390/math10183260Identification of Review Helpfulness Using Novel Textual and Language-Context FeaturesMuhammad Shehrayar Khan0Atif Rizwan1Muhammad Shahzad Faisal2Tahir Ahmad3Muhammad Saleem Khan4Ghada Atteia5Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Engineering, Jeju National University, Jeju-si 63243, KoreaDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaWith the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.https://www.mdpi.com/2227-7390/10/18/3260neural networkWord2VecNatural Language Processingsentiment classification
spellingShingle Muhammad Shehrayar Khan
Atif Rizwan
Muhammad Shahzad Faisal
Tahir Ahmad
Muhammad Saleem Khan
Ghada Atteia
Identification of Review Helpfulness Using Novel Textual and Language-Context Features
Mathematics
neural network
Word2Vec
Natural Language Processing
sentiment classification
title Identification of Review Helpfulness Using Novel Textual and Language-Context Features
title_full Identification of Review Helpfulness Using Novel Textual and Language-Context Features
title_fullStr Identification of Review Helpfulness Using Novel Textual and Language-Context Features
title_full_unstemmed Identification of Review Helpfulness Using Novel Textual and Language-Context Features
title_short Identification of Review Helpfulness Using Novel Textual and Language-Context Features
title_sort identification of review helpfulness using novel textual and language context features
topic neural network
Word2Vec
Natural Language Processing
sentiment classification
url https://www.mdpi.com/2227-7390/10/18/3260
work_keys_str_mv AT muhammadshehrayarkhan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures
AT atifrizwan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures
AT muhammadshahzadfaisal identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures
AT tahirahmad identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures
AT muhammadsaleemkhan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures
AT ghadaatteia identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures