Fraudulent e-Commerce website detection model using HTML, text and image features

Many of Internet users have been the victims of fraudulent e-commerce websites and the number grows. This paper presents an investigation on three types of features namely HTML tags, textual content and image of the website that could possibly contain some patterns that indicate it is fraudulent. Fo...

Full description

Bibliographic Details
Main Authors: Khoo, Eric, Zainal, Anazida, Ariffin, Nurfadilah, Kassim, Mohd. Nizam, Maarof, Mohd Aizaini, Bakhtiari, Majid
Format: Conference or Workshop Item
Published: 2020
Subjects:
_version_ 1796865766815956992
author Khoo, Eric
Zainal, Anazida
Ariffin, Nurfadilah
Kassim, Mohd. Nizam
Maarof, Mohd Aizaini
Bakhtiari, Majid
author_facet Khoo, Eric
Zainal, Anazida
Ariffin, Nurfadilah
Kassim, Mohd. Nizam
Maarof, Mohd Aizaini
Bakhtiari, Majid
author_sort Khoo, Eric
collection ePrints
description Many of Internet users have been the victims of fraudulent e-commerce websites and the number grows. This paper presents an investigation on three types of features namely HTML tags, textual content and image of the website that could possibly contain some patterns that indicate it is fraudulent. Four machine learning algorithms were used to measure the accuracy of the fraudulent e-commerce websites detection. These techniques are Linear Regression, Decision Tree, Random Forest and XGBoost. 497 e-commerce websites were used as training and testing dataset. Testing was done in two phases. In phase one, each features was tested to see its discriminative capability. Meanwhile in phase two, these features were combined. The result shows that textual content has consistently outperformed the other two features especially when XGBoost was used as a classifier. With combined features, overall accuracy has improved and best result of accuracy recorded was 98.7% achieved when Linear Regression was used as a classifier.
first_indexed 2024-03-05T21:02:04Z
format Conference or Workshop Item
id utm.eprints-94157
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T21:02:04Z
publishDate 2020
record_format dspace
spelling utm.eprints-941572022-02-28T13:24:56Z http://eprints.utm.my/94157/ Fraudulent e-Commerce website detection model using HTML, text and image features Khoo, Eric Zainal, Anazida Ariffin, Nurfadilah Kassim, Mohd. Nizam Maarof, Mohd Aizaini Bakhtiari, Majid QA75 Electronic computers. Computer science Many of Internet users have been the victims of fraudulent e-commerce websites and the number grows. This paper presents an investigation on three types of features namely HTML tags, textual content and image of the website that could possibly contain some patterns that indicate it is fraudulent. Four machine learning algorithms were used to measure the accuracy of the fraudulent e-commerce websites detection. These techniques are Linear Regression, Decision Tree, Random Forest and XGBoost. 497 e-commerce websites were used as training and testing dataset. Testing was done in two phases. In phase one, each features was tested to see its discriminative capability. Meanwhile in phase two, these features were combined. The result shows that textual content has consistently outperformed the other two features especially when XGBoost was used as a classifier. With combined features, overall accuracy has improved and best result of accuracy recorded was 98.7% achieved when Linear Regression was used as a classifier. 2020 Conference or Workshop Item PeerReviewed Khoo, Eric and Zainal, Anazida and Ariffin, Nurfadilah and Kassim, Mohd. Nizam and Maarof, Mohd Aizaini and Bakhtiari, Majid (2020) Fraudulent e-Commerce website detection model using HTML, text and image features. In: 11th International Conference on Soft Computing and Pattern Recognition, SoCPaR 2019, and 11th World Congress on Nature and Biologically Inspired Computing, NaBIC 2019, 13 – 15 December 2019, Hyderabad, India. http://dx.doi.org/10.1007/978-3-030-49345-5_19
spellingShingle QA75 Electronic computers. Computer science
Khoo, Eric
Zainal, Anazida
Ariffin, Nurfadilah
Kassim, Mohd. Nizam
Maarof, Mohd Aizaini
Bakhtiari, Majid
Fraudulent e-Commerce website detection model using HTML, text and image features
title Fraudulent e-Commerce website detection model using HTML, text and image features
title_full Fraudulent e-Commerce website detection model using HTML, text and image features
title_fullStr Fraudulent e-Commerce website detection model using HTML, text and image features
title_full_unstemmed Fraudulent e-Commerce website detection model using HTML, text and image features
title_short Fraudulent e-Commerce website detection model using HTML, text and image features
title_sort fraudulent e commerce website detection model using html text and image features
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT khooeric fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures
AT zainalanazida fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures
AT ariffinnurfadilah fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures
AT kassimmohdnizam fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures
AT maarofmohdaizaini fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures
AT bakhtiarimajid fraudulentecommercewebsitedetectionmodelusinghtmltextandimagefeatures