Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction

Recently, phishing attacks have been a crucial threat to cyberspace security. Phishing is a form of fraud that attracts people and businesses to access malicious uniform resource locators (URLs) and submit their sensitive information such as passwords, credit card ids, and personal information. Enor...

Full description

Bibliographic Details
Main Authors: Muna Elsadig, Ashraf Osman Ibrahim Elsayed, Shakila Basheer, Manal Abdullah Alohali, Sara Alshunaifi, Haya Alqahtani, Nihal Alharbi, Wamda Nagmeldin
Format: Article
Language:English
English
Published: MDPI 2022
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/36792/1/ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/36792/2/FULLTEXT.pdf
_version_ 1796911879936802816
author Muna Elsadig
Ashraf Osman Ibrahim Elsayed
Shakila Basheer
Manal Abdullah Alohali
Sara Alshunaifi
Haya Alqahtani
Nihal Alharbi
Wamda Nagmeldin
author_facet Muna Elsadig
Ashraf Osman Ibrahim Elsayed
Shakila Basheer
Manal Abdullah Alohali
Sara Alshunaifi
Haya Alqahtani
Nihal Alharbi
Wamda Nagmeldin
author_sort Muna Elsadig
collection UMS
description Recently, phishing attacks have been a crucial threat to cyberspace security. Phishing is a form of fraud that attracts people and businesses to access malicious uniform resource locators (URLs) and submit their sensitive information such as passwords, credit card ids, and personal information. Enormous intelligent attacks are launched dynamically with the aim of tricking users into thinking they are accessing a reliable website or online application to acquire account information. Researchers in cyberspace are motivated to create intelligent models and offer secure services on the web as phishing grows more intelligent and malicious every day. In this paper, a novel URL phishing detection technique based on BERT feature extraction and a deep learning method is introduced. BERT was used to extract the URLs’ text from the Phishing Site Predict dataset. Then, the natural language processing (NLP) algorithm was applied to the unique data column and extracted a huge number of useful data features in terms of meaningful text information. Next, a deep convolutional neural network method was utilised to detect phishing URLs. It was used to constitute words or n-grams in order to extract higher-level features. Then, the data were classified into legitimate and phishing URLs. To evaluate the proposed method, a famous public phishing website URLs dataset was used, with a total of 549,346 entries. However, three scenarios were developed to compare the outcomes of the proposed method by using similar datasets. The feature extraction process depends on natural language processing techniques. The experiments showed that the proposed method had achieved 96.66% accuracy in the results, and then the obtained results were compared to other literature review works. The results showed that the proposed method was efficient and valid in detecting phishing websites’ URLs.
first_indexed 2024-03-06T03:25:19Z
format Article
id ums.eprints-36792
institution Universiti Malaysia Sabah
language English
English
last_indexed 2024-03-06T03:25:19Z
publishDate 2022
publisher MDPI
record_format dspace
spelling ums.eprints-367922023-09-14T02:04:08Z https://eprints.ums.edu.my/id/eprint/36792/ Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction Muna Elsadig Ashraf Osman Ibrahim Elsayed Shakila Basheer Manal Abdullah Alohali Sara Alshunaifi Haya Alqahtani Nihal Alharbi Wamda Nagmeldin QA76.75-76.765 Computer software Recently, phishing attacks have been a crucial threat to cyberspace security. Phishing is a form of fraud that attracts people and businesses to access malicious uniform resource locators (URLs) and submit their sensitive information such as passwords, credit card ids, and personal information. Enormous intelligent attacks are launched dynamically with the aim of tricking users into thinking they are accessing a reliable website or online application to acquire account information. Researchers in cyberspace are motivated to create intelligent models and offer secure services on the web as phishing grows more intelligent and malicious every day. In this paper, a novel URL phishing detection technique based on BERT feature extraction and a deep learning method is introduced. BERT was used to extract the URLs’ text from the Phishing Site Predict dataset. Then, the natural language processing (NLP) algorithm was applied to the unique data column and extracted a huge number of useful data features in terms of meaningful text information. Next, a deep convolutional neural network method was utilised to detect phishing URLs. It was used to constitute words or n-grams in order to extract higher-level features. Then, the data were classified into legitimate and phishing URLs. To evaluate the proposed method, a famous public phishing website URLs dataset was used, with a total of 549,346 entries. However, three scenarios were developed to compare the outcomes of the proposed method by using similar datasets. The feature extraction process depends on natural language processing techniques. The experiments showed that the proposed method had achieved 96.66% accuracy in the results, and then the obtained results were compared to other literature review works. The results showed that the proposed method was efficient and valid in detecting phishing websites’ URLs. MDPI 2022-11-08 Article NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/36792/1/ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/36792/2/FULLTEXT.pdf Muna Elsadig and Ashraf Osman Ibrahim Elsayed and Shakila Basheer and Manal Abdullah Alohali and Sara Alshunaifi and Haya Alqahtani and Nihal Alharbi and Wamda Nagmeldin (2022) Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction. Electronics, 11 (3647). pp. 1-18. https://doi.org/10.3390/electronics11223647
spellingShingle QA76.75-76.765 Computer software
Muna Elsadig
Ashraf Osman Ibrahim Elsayed
Shakila Basheer
Manal Abdullah Alohali
Sara Alshunaifi
Haya Alqahtani
Nihal Alharbi
Wamda Nagmeldin
Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title_full Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title_fullStr Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title_full_unstemmed Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title_short Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction
title_sort intelligent deep machine learning cyber phishing url detection based on bert features extraction
topic QA76.75-76.765 Computer software
url https://eprints.ums.edu.my/id/eprint/36792/1/ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/36792/2/FULLTEXT.pdf
work_keys_str_mv AT munaelsadig intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT ashrafosmanibrahimelsayed intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT shakilabasheer intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT manalabdullahalohali intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT saraalshunaifi intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT hayaalqahtani intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT nihalalharbi intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction
AT wamdanagmeldin intelligentdeepmachinelearningcyberphishingurldetectionbasedonbertfeaturesextraction