PUMD: a PU learning-based malicious domain detection framework

Abstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain sampl...

Full description

Bibliographic Details
Main Authors:	Zhaoshan Fan, Qing Wang, Haoran Jiao, Junrong Liu, Zelin Cui, Song Liu, Yuling Liu
Format:	Article
Language:	English
Published:	SpringerOpen 2022-10-01
Series:	Cybersecurity
Subjects:	Malicious domain detection Insufficient credible label information Class imbalance Incompact distribution PU learning
Online Access:	https://doi.org/10.1186/s42400-022-00124-x

_version_	1811200669784211456
author	Zhaoshan Fan Qing Wang Haoran Jiao Junrong Liu Zelin Cui Song Liu Yuling Liu
author_facet	Zhaoshan Fan Qing Wang Haoran Jiao Junrong Liu Zelin Cui Song Liu Yuling Liu
author_sort	Zhaoshan Fan
collection	DOAJ
description	Abstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain samples in different malicious activities. This paper proposes a malicious domain detection framework named PUMD, which innovatively introduces Positive and Unlabeled (PU) learning solution to solve the problem of insufficient label information, adopts customized sample weight to improve the impact of class imbalance, and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples. Besides, a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features. Finally, we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework’s ability to capture potential command and control (C&C) domains for malicious activities. The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.
first_indexed	2024-04-12T02:07:31Z
format	Article
id	doaj.art-13384b530d054686bd730d4492a706d7
institution	Directory Open Access Journal
issn	2523-3246
language	English
last_indexed	2024-04-12T02:07:31Z
publishDate	2022-10-01
publisher	SpringerOpen
record_format	Article
series	Cybersecurity
spelling	doaj.art-13384b530d054686bd730d4492a706d72022-12-22T03:52:29ZengSpringerOpenCybersecurity2523-32462022-10-015112210.1186/s42400-022-00124-xPUMD: a PU learning-based malicious domain detection frameworkZhaoshan Fan0Qing Wang1Haoran Jiao2Junrong Liu3Zelin Cui4Song Liu5Yuling Liu6Institute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesAbstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain samples in different malicious activities. This paper proposes a malicious domain detection framework named PUMD, which innovatively introduces Positive and Unlabeled (PU) learning solution to solve the problem of insufficient label information, adopts customized sample weight to improve the impact of class imbalance, and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples. Besides, a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features. Finally, we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework’s ability to capture potential command and control (C&C) domains for malicious activities. The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.https://doi.org/10.1186/s42400-022-00124-xMalicious domain detectionInsufficient credible label informationClass imbalanceIncompact distributionPU learning
spellingShingle	Zhaoshan Fan Qing Wang Haoran Jiao Junrong Liu Zelin Cui Song Liu Yuling Liu PUMD: a PU learning-based malicious domain detection framework Cybersecurity Malicious domain detection Insufficient credible label information Class imbalance Incompact distribution PU learning
title	PUMD: a PU learning-based malicious domain detection framework
title_full	PUMD: a PU learning-based malicious domain detection framework
title_fullStr	PUMD: a PU learning-based malicious domain detection framework
title_full_unstemmed	PUMD: a PU learning-based malicious domain detection framework
title_short	PUMD: a PU learning-based malicious domain detection framework
title_sort	pumd a pu learning based malicious domain detection framework
topic	Malicious domain detection Insufficient credible label information Class imbalance Incompact distribution PU learning
url	https://doi.org/10.1186/s42400-022-00124-x
work_keys_str_mv	AT zhaoshanfan pumdapulearningbasedmaliciousdomaindetectionframework AT qingwang pumdapulearningbasedmaliciousdomaindetectionframework AT haoranjiao pumdapulearningbasedmaliciousdomaindetectionframework AT junrongliu pumdapulearningbasedmaliciousdomaindetectionframework AT zelincui pumdapulearningbasedmaliciousdomaindetectionframework AT songliu pumdapulearningbasedmaliciousdomaindetectionframework AT yulingliu pumdapulearningbasedmaliciousdomaindetectionframework

PUMD: a PU learning-based malicious domain detection framework

Similar Items