Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Con...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/23/12/1645 |
_version_ | 1797504916911030272 |
---|---|
author | Ishani Chatterjee Mengchu Zhou Abdullah Abusorrah Khaled Sedraoui Ahmed Alabdulwahab |
author_facet | Ishani Chatterjee Mengchu Zhou Abdullah Abusorrah Khaled Sedraoui Ahmed Alabdulwahab |
author_sort | Ishani Chatterjee |
collection | DOAJ |
description | People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms. |
first_indexed | 2024-03-10T04:11:13Z |
format | Article |
id | doaj.art-7a158b9a1e0a4959ae0cd215c879b04d |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-10T04:11:13Z |
publishDate | 2021-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-7a158b9a1e0a4959ae0cd215c879b04d2023-11-23T08:11:04ZengMDPI AGEntropy1099-43002021-12-012312164510.3390/e23121645Statistics-Based Outlier Detection and Correction Method for Amazon Customer ReviewsIshani Chatterjee0Mengchu Zhou1Abdullah Abusorrah2Khaled Sedraoui3Ahmed Alabdulwahab4Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USADepartment of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USADepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaDepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaDepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaPeople nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.https://www.mdpi.com/1099-4300/23/12/1645sentiment analysisinterquartile rangeTextBlobnatural language processingoutlier detectiondata scrapping |
spellingShingle | Ishani Chatterjee Mengchu Zhou Abdullah Abusorrah Khaled Sedraoui Ahmed Alabdulwahab Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews Entropy sentiment analysis interquartile range TextBlob natural language processing outlier detection data scrapping |
title | Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews |
title_full | Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews |
title_fullStr | Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews |
title_full_unstemmed | Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews |
title_short | Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews |
title_sort | statistics based outlier detection and correction method for amazon customer reviews |
topic | sentiment analysis interquartile range TextBlob natural language processing outlier detection data scrapping |
url | https://www.mdpi.com/1099-4300/23/12/1645 |
work_keys_str_mv | AT ishanichatterjee statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews AT mengchuzhou statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews AT abdullahabusorrah statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews AT khaledsedraoui statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews AT ahmedalabdulwahab statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews |