Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews

People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Con...

Full description

Bibliographic Details
Main Authors: Ishani Chatterjee, Mengchu Zhou, Abdullah Abusorrah, Khaled Sedraoui, Ahmed Alabdulwahab
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/12/1645
_version_ 1797504916911030272
author Ishani Chatterjee
Mengchu Zhou
Abdullah Abusorrah
Khaled Sedraoui
Ahmed Alabdulwahab
author_facet Ishani Chatterjee
Mengchu Zhou
Abdullah Abusorrah
Khaled Sedraoui
Ahmed Alabdulwahab
author_sort Ishani Chatterjee
collection DOAJ
description People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.
first_indexed 2024-03-10T04:11:13Z
format Article
id doaj.art-7a158b9a1e0a4959ae0cd215c879b04d
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T04:11:13Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-7a158b9a1e0a4959ae0cd215c879b04d2023-11-23T08:11:04ZengMDPI AGEntropy1099-43002021-12-012312164510.3390/e23121645Statistics-Based Outlier Detection and Correction Method for Amazon Customer ReviewsIshani Chatterjee0Mengchu Zhou1Abdullah Abusorrah2Khaled Sedraoui3Ahmed Alabdulwahab4Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USADepartment of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USADepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaDepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaDepartment of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi ArabiaPeople nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.https://www.mdpi.com/1099-4300/23/12/1645sentiment analysisinterquartile rangeTextBlobnatural language processingoutlier detectiondata scrapping
spellingShingle Ishani Chatterjee
Mengchu Zhou
Abdullah Abusorrah
Khaled Sedraoui
Ahmed Alabdulwahab
Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
Entropy
sentiment analysis
interquartile range
TextBlob
natural language processing
outlier detection
data scrapping
title Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_full Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_fullStr Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_full_unstemmed Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_short Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_sort statistics based outlier detection and correction method for amazon customer reviews
topic sentiment analysis
interquartile range
TextBlob
natural language processing
outlier detection
data scrapping
url https://www.mdpi.com/1099-4300/23/12/1645
work_keys_str_mv AT ishanichatterjee statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT mengchuzhou statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT abdullahabusorrah statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT khaledsedraoui statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT ahmedalabdulwahab statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews