Behind the Bait: Delving into PhishTank's hidden data

Phishing constitutes a form of social engineering that aims to deceive individuals through email communication. Extensive prior research has underscored phishing as one of the most commonly employed attack vectors for infiltrating organizational networks. A prevalent method involves misleading the t...

Full description

Bibliographic Details
Main Authors: Affan Yasin, Rubia Fatima, Javed Ali Khan, Wasif Afzal
Format: Article
Language:English
Published: Elsevier 2024-02-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923009903
_version_ 1827353645897744384
author Affan Yasin
Rubia Fatima
Javed Ali Khan
Wasif Afzal
author_facet Affan Yasin
Rubia Fatima
Javed Ali Khan
Wasif Afzal
author_sort Affan Yasin
collection DOAJ
description Phishing constitutes a form of social engineering that aims to deceive individuals through email communication. Extensive prior research has underscored phishing as one of the most commonly employed attack vectors for infiltrating organizational networks. A prevalent method involves misleading the target by employing phishing URLs concealed through hyperlink strategies. PhishTank, a website employing the concept of crowd-sourcing, aggregates phishing URLs and subsequently verifies their authenticity. In the course of this study, we leveraged a Python script to extract data from the PhishTank website, amassing a comprehensive dataset comprising over 190,0000 phishing URLs. This dataset is a valuable resource that can be harnessed by both researchers and practitioners for enhancing phish- ing filters, fortifying firewalls, security education, and refining training and testing models, among other applications.
first_indexed 2024-03-08T03:30:27Z
format Article
id doaj.art-959b41dde3f747e4bc5822d457d0e2aa
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-03-08T03:30:27Z
publishDate 2024-02-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-959b41dde3f747e4bc5822d457d0e2aa2024-02-11T05:10:42ZengElsevierData in Brief2352-34092024-02-0152109959Behind the Bait: Delving into PhishTank's hidden dataAffan Yasin0Rubia Fatima1Javed Ali Khan2Wasif Afzal3School of Software, Northwestern Polytechnical University, Xian 710072, Shaanxi, ChinaSchool of Software, Tsinghua University, Beijing, ChinaDepartment of Computer Science, School of Physics, Engineering & Computer Science, University of Hertfordshire, Hatfield, UKSchool of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden; Corresponding author.Phishing constitutes a form of social engineering that aims to deceive individuals through email communication. Extensive prior research has underscored phishing as one of the most commonly employed attack vectors for infiltrating organizational networks. A prevalent method involves misleading the target by employing phishing URLs concealed through hyperlink strategies. PhishTank, a website employing the concept of crowd-sourcing, aggregates phishing URLs and subsequently verifies their authenticity. In the course of this study, we leveraged a Python script to extract data from the PhishTank website, amassing a comprehensive dataset comprising over 190,0000 phishing URLs. This dataset is a valuable resource that can be harnessed by both researchers and practitioners for enhancing phish- ing filters, fortifying firewalls, security education, and refining training and testing models, among other applications.http://www.sciencedirect.com/science/article/pii/S2352340923009903Phished URLSocial engineeringEmail securityWeb securityComputer securityArtificial intelligence
spellingShingle Affan Yasin
Rubia Fatima
Javed Ali Khan
Wasif Afzal
Behind the Bait: Delving into PhishTank's hidden data
Data in Brief
Phished URL
Social engineering
Email security
Web security
Computer security
Artificial intelligence
title Behind the Bait: Delving into PhishTank's hidden data
title_full Behind the Bait: Delving into PhishTank's hidden data
title_fullStr Behind the Bait: Delving into PhishTank's hidden data
title_full_unstemmed Behind the Bait: Delving into PhishTank's hidden data
title_short Behind the Bait: Delving into PhishTank's hidden data
title_sort behind the bait delving into phishtank s hidden data
topic Phished URL
Social engineering
Email security
Web security
Computer security
Artificial intelligence
url http://www.sciencedirect.com/science/article/pii/S2352340923009903
work_keys_str_mv AT affanyasin behindthebaitdelvingintophishtankshiddendata
AT rubiafatima behindthebaitdelvingintophishtankshiddendata
AT javedalikhan behindthebaitdelvingintophishtankshiddendata
AT wasifafzal behindthebaitdelvingintophishtankshiddendata