Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning

Web applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or creat...

Full description

Bibliographic Details
Main Authors:	Mohammed Alsaedi, Fuad A. Ghaleb, Faisal Saeed, Jawad Ahmad, Mohammed Alasli
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Sensors
Subjects:	malicious URLs cyber threat intelligence ensemble learning internet security cybersecurity
Online Access:	https://www.mdpi.com/1424-8220/22/9/3373

_version_	1797502873962020864
author	Mohammed Alsaedi Fuad A. Ghaleb Faisal Saeed Jawad Ahmad Mohammed Alasli
author_facet	Mohammed Alsaedi Fuad A. Ghaleb Faisal Saeed Jawad Ahmad Mohammed Alasli
author_sort	Mohammed Alsaedi
collection	DOAJ
description	Web applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or created and managed by hackers such as fraudulent and phishing websites. Detecting malicious websites is essential to prevent the spreading of malware and protect end-users from being victims. However, most existing solutions rely on extracting features from the website’s content which can be harmful to the detection machines themselves and subject to obfuscations. Detecting malicious Uniform Resource Locators (URLs) is safer and more efficient than content analysis. However, the detection of malicious URLs is still not well addressed due to insufficient features and inaccurate classification. This study aims at improving the detection accuracy of malicious URL detection by designing and developing a cyber threat intelligence-based malicious URL detection model using two-stage ensemble learning. The cyber threat intelligence-based features are extracted from web searches to improve detection accuracy. Cybersecurity analysts and users reports around the globe can provide important information regarding malicious websites. Therefore, cyber threat intelligence-based (CTI) features extracted from Google searches and Whois websites are used to improve detection performance. The study also proposed a two-stage ensemble learning model that combines the random forest (RF) algorithm for preclassification with multilayer perceptron (MLP) for final decision making. The trained MLP classifier has replaced the majority voting scheme of the three trained random forest classifiers for decision making. The probabilistic output of the weak classifiers of the random forest was aggregated and used as input for the MLP classifier for adequate classification. Results show that the extracted CTI-based features with the two-stage classification outperform other studies’ detection models. The proposed CTI-based detection model achieved a 7.8% accuracy improvement and 6.7% reduction in false-positive rates compared with the traditional URL-based model.
first_indexed	2024-03-10T03:42:22Z
format	Article
id	doaj.art-d55f80bd2ac54884bc1d2800212b58fb
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T03:42:22Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-d55f80bd2ac54884bc1d2800212b58fb2023-11-23T09:17:30ZengMDPI AGSensors1424-82202022-04-01229337310.3390/s22093373Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble LearningMohammed Alsaedi0Fuad A. Ghaleb1Faisal Saeed2Jawad Ahmad3Mohammed Alasli4College of Computer Science and Engineering, Taibah University, P.O. Box 344, Medina 41411, Saudi ArabiaSchool of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, MalaysiaCollege of Computer Science and Engineering, Taibah University, P.O. Box 344, Medina 41411, Saudi ArabiaSchool of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UKCollege of Computer Science and Engineering, Taibah University, P.O. Box 344, Medina 41411, Saudi ArabiaWeb applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or created and managed by hackers such as fraudulent and phishing websites. Detecting malicious websites is essential to prevent the spreading of malware and protect end-users from being victims. However, most existing solutions rely on extracting features from the website’s content which can be harmful to the detection machines themselves and subject to obfuscations. Detecting malicious Uniform Resource Locators (URLs) is safer and more efficient than content analysis. However, the detection of malicious URLs is still not well addressed due to insufficient features and inaccurate classification. This study aims at improving the detection accuracy of malicious URL detection by designing and developing a cyber threat intelligence-based malicious URL detection model using two-stage ensemble learning. The cyber threat intelligence-based features are extracted from web searches to improve detection accuracy. Cybersecurity analysts and users reports around the globe can provide important information regarding malicious websites. Therefore, cyber threat intelligence-based (CTI) features extracted from Google searches and Whois websites are used to improve detection performance. The study also proposed a two-stage ensemble learning model that combines the random forest (RF) algorithm for preclassification with multilayer perceptron (MLP) for final decision making. The trained MLP classifier has replaced the majority voting scheme of the three trained random forest classifiers for decision making. The probabilistic output of the weak classifiers of the random forest was aggregated and used as input for the MLP classifier for adequate classification. Results show that the extracted CTI-based features with the two-stage classification outperform other studies’ detection models. The proposed CTI-based detection model achieved a 7.8% accuracy improvement and 6.7% reduction in false-positive rates compared with the traditional URL-based model.https://www.mdpi.com/1424-8220/22/9/3373malicious URLscyber threat intelligenceensemble learninginternet securitycybersecurity
spellingShingle	Mohammed Alsaedi Fuad A. Ghaleb Faisal Saeed Jawad Ahmad Mohammed Alasli Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning Sensors malicious URLs cyber threat intelligence ensemble learning internet security cybersecurity
title	Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_full	Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_fullStr	Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_full_unstemmed	Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_short	Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_sort	cyber threat intelligence based malicious url detection model using ensemble learning
topic	malicious URLs cyber threat intelligence ensemble learning internet security cybersecurity
url	https://www.mdpi.com/1424-8220/22/9/3373
work_keys_str_mv	AT mohammedalsaedi cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning AT fuadaghaleb cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning AT faisalsaeed cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning AT jawadahmad cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning AT mohammedalasli cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning

Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning

Similar Items