A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection

The growing dependence on digital systems has heightened the risks posed by cybersecurity threats. This paper proposes a new method for detecting malicious webpages among several adversary activities. As shown in previous studies, malicious URL detection performance is significantly affected by the...

Full description

Bibliographic Details
Main Authors: Sam-Shin Shin, Seung-Goo Ji, Sung-Sam Hong
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/23/12070
_version_ 1797463662880882688
author Sam-Shin Shin
Seung-Goo Ji
Sung-Sam Hong
author_facet Sam-Shin Shin
Seung-Goo Ji
Sung-Sam Hong
author_sort Sam-Shin Shin
collection DOAJ
description The growing dependence on digital systems has heightened the risks posed by cybersecurity threats. This paper proposes a new method for detecting malicious webpages among several adversary activities. As shown in previous studies, malicious URL detection performance is significantly affected by the learning dataset features. The overall performance of different machine learning models varies depending on the data features, and using a particular model alone is not always desirable in any given environment. To address these limitations, we propose an ensemble approach using different machine learning models. Our proposed method outperforms the existing single model by 6%, allowing for the detection of an additional 141 malicious URLs. In this study, repetitive tasks are automated, improving the performance of different machine learning models. In addition, the proposed framework builds an advanced feature set based on URL and web content and includes the most optimized detection model structure. The proposed technology can contribute to define an advanced feature set based on URL and web content and includes the most optimized detection model structure and research on automated technology for the detection of malicious websites, such as phishing websites and malicious code distribution.
first_indexed 2024-03-09T17:53:57Z
format Article
id doaj.art-1e435e86850e41e49bfcc2ceaca10a01
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T17:53:57Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-1e435e86850e41e49bfcc2ceaca10a012023-11-24T10:30:30ZengMDPI AGApplied Sciences2076-34172022-11-0112231207010.3390/app122312070A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage DetectionSam-Shin Shin0Seung-Goo Ji1Sung-Sam Hong2Internet Incident Response Technology Team, Korea Internet & Security Agency, Naju 58324, Republic of KoreaInternet Incident Response Technology Team, Korea Internet & Security Agency, Naju 58324, Republic of KoreaDepartment of Multimedia Contents, Jangan University, Hwaseong 18331, Republic of KoreaThe growing dependence on digital systems has heightened the risks posed by cybersecurity threats. This paper proposes a new method for detecting malicious webpages among several adversary activities. As shown in previous studies, malicious URL detection performance is significantly affected by the learning dataset features. The overall performance of different machine learning models varies depending on the data features, and using a particular model alone is not always desirable in any given environment. To address these limitations, we propose an ensemble approach using different machine learning models. Our proposed method outperforms the existing single model by 6%, allowing for the detection of an additional 141 malicious URLs. In this study, repetitive tasks are automated, improving the performance of different machine learning models. In addition, the proposed framework builds an advanced feature set based on URL and web content and includes the most optimized detection model structure. The proposed technology can contribute to define an advanced feature set based on URL and web content and includes the most optimized detection model structure and research on automated technology for the detection of malicious websites, such as phishing websites and malicious code distribution.https://www.mdpi.com/2076-3417/12/23/12070securitymalicious URL detectionmachine learningensemble learningartificial intelligence
spellingShingle Sam-Shin Shin
Seung-Goo Ji
Sung-Sam Hong
A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
Applied Sciences
security
malicious URL detection
machine learning
ensemble learning
artificial intelligence
title A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
title_full A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
title_fullStr A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
title_full_unstemmed A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
title_short A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection
title_sort heterogeneous machine learning ensemble framework for malicious webpage detection
topic security
malicious URL detection
machine learning
ensemble learning
artificial intelligence
url https://www.mdpi.com/2076-3417/12/23/12070
work_keys_str_mv AT samshinshin aheterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection
AT seunggooji aheterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection
AT sungsamhong aheterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection
AT samshinshin heterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection
AT seunggooji heterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection
AT sungsamhong heterogeneousmachinelearningensembleframeworkformaliciouswebpagedetection