Identification of Review Helpfulness Using Novel Textual and Language-Context Features
With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is compl...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-09-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/10/18/3260 |
_version_ | 1797485194780868608 |
---|---|
author | Muhammad Shehrayar Khan Atif Rizwan Muhammad Shahzad Faisal Tahir Ahmad Muhammad Saleem Khan Ghada Atteia |
author_facet | Muhammad Shehrayar Khan Atif Rizwan Muhammad Shahzad Faisal Tahir Ahmad Muhammad Saleem Khan Ghada Atteia |
author_sort | Muhammad Shehrayar Khan |
collection | DOAJ |
description | With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure. |
first_indexed | 2024-03-09T23:16:17Z |
format | Article |
id | doaj.art-10d7c63128bc4cadbd6f1e886282a53a |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-09T23:16:17Z |
publishDate | 2022-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-10d7c63128bc4cadbd6f1e886282a53a2023-11-23T17:35:31ZengMDPI AGMathematics2227-73902022-09-011018326010.3390/math10183260Identification of Review Helpfulness Using Novel Textual and Language-Context FeaturesMuhammad Shehrayar Khan0Atif Rizwan1Muhammad Shahzad Faisal2Tahir Ahmad3Muhammad Saleem Khan4Ghada Atteia5Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Engineering, Jeju National University, Jeju-si 63243, KoreaDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad 43600, PakistanDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaWith the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.https://www.mdpi.com/2227-7390/10/18/3260neural networkWord2VecNatural Language Processingsentiment classification |
spellingShingle | Muhammad Shehrayar Khan Atif Rizwan Muhammad Shahzad Faisal Tahir Ahmad Muhammad Saleem Khan Ghada Atteia Identification of Review Helpfulness Using Novel Textual and Language-Context Features Mathematics neural network Word2Vec Natural Language Processing sentiment classification |
title | Identification of Review Helpfulness Using Novel Textual and Language-Context Features |
title_full | Identification of Review Helpfulness Using Novel Textual and Language-Context Features |
title_fullStr | Identification of Review Helpfulness Using Novel Textual and Language-Context Features |
title_full_unstemmed | Identification of Review Helpfulness Using Novel Textual and Language-Context Features |
title_short | Identification of Review Helpfulness Using Novel Textual and Language-Context Features |
title_sort | identification of review helpfulness using novel textual and language context features |
topic | neural network Word2Vec Natural Language Processing sentiment classification |
url | https://www.mdpi.com/2227-7390/10/18/3260 |
work_keys_str_mv | AT muhammadshehrayarkhan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures AT atifrizwan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures AT muhammadshahzadfaisal identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures AT tahirahmad identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures AT muhammadsaleemkhan identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures AT ghadaatteia identificationofreviewhelpfulnessusingnoveltextualandlanguagecontextfeatures |