PUMD: a PU learning-based malicious domain detection framework

Abstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain sampl...

Full description

Bibliographic Details
Main Authors: Zhaoshan Fan, Qing Wang, Haoran Jiao, Junrong Liu, Zelin Cui, Song Liu, Yuling Liu
Format: Article
Language:English
Published: SpringerOpen 2022-10-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-022-00124-x
_version_ 1811200669784211456
author Zhaoshan Fan
Qing Wang
Haoran Jiao
Junrong Liu
Zelin Cui
Song Liu
Yuling Liu
author_facet Zhaoshan Fan
Qing Wang
Haoran Jiao
Junrong Liu
Zelin Cui
Song Liu
Yuling Liu
author_sort Zhaoshan Fan
collection DOAJ
description Abstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain samples in different malicious activities. This paper proposes a malicious domain detection framework named PUMD, which innovatively introduces Positive and Unlabeled (PU) learning solution to solve the problem of insufficient label information, adopts customized sample weight to improve the impact of class imbalance, and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples. Besides, a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features. Finally, we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework’s ability to capture potential command and control (C&C) domains for malicious activities. The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.
first_indexed 2024-04-12T02:07:31Z
format Article
id doaj.art-13384b530d054686bd730d4492a706d7
institution Directory Open Access Journal
issn 2523-3246
language English
last_indexed 2024-04-12T02:07:31Z
publishDate 2022-10-01
publisher SpringerOpen
record_format Article
series Cybersecurity
spelling doaj.art-13384b530d054686bd730d4492a706d72022-12-22T03:52:29ZengSpringerOpenCybersecurity2523-32462022-10-015112210.1186/s42400-022-00124-xPUMD: a PU learning-based malicious domain detection frameworkZhaoshan Fan0Qing Wang1Haoran Jiao2Junrong Liu3Zelin Cui4Song Liu5Yuling Liu6Institute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesAbstract Domain name system (DNS), as one of the most critical internet infrastructure, has been abused by various cyber attacks. Current malicious domain detection capabilities are limited by insufficient credible label information, severe class imbalance, and incompact distribution of domain samples in different malicious activities. This paper proposes a malicious domain detection framework named PUMD, which innovatively introduces Positive and Unlabeled (PU) learning solution to solve the problem of insufficient label information, adopts customized sample weight to improve the impact of class imbalance, and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples. Besides, a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features. Finally, we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework’s ability to capture potential command and control (C&C) domains for malicious activities. The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.https://doi.org/10.1186/s42400-022-00124-xMalicious domain detectionInsufficient credible label informationClass imbalanceIncompact distributionPU learning
spellingShingle Zhaoshan Fan
Qing Wang
Haoran Jiao
Junrong Liu
Zelin Cui
Song Liu
Yuling Liu
PUMD: a PU learning-based malicious domain detection framework
Cybersecurity
Malicious domain detection
Insufficient credible label information
Class imbalance
Incompact distribution
PU learning
title PUMD: a PU learning-based malicious domain detection framework
title_full PUMD: a PU learning-based malicious domain detection framework
title_fullStr PUMD: a PU learning-based malicious domain detection framework
title_full_unstemmed PUMD: a PU learning-based malicious domain detection framework
title_short PUMD: a PU learning-based malicious domain detection framework
title_sort pumd a pu learning based malicious domain detection framework
topic Malicious domain detection
Insufficient credible label information
Class imbalance
Incompact distribution
PU learning
url https://doi.org/10.1186/s42400-022-00124-x
work_keys_str_mv AT zhaoshanfan pumdapulearningbasedmaliciousdomaindetectionframework
AT qingwang pumdapulearningbasedmaliciousdomaindetectionframework
AT haoranjiao pumdapulearningbasedmaliciousdomaindetectionframework
AT junrongliu pumdapulearningbasedmaliciousdomaindetectionframework
AT zelincui pumdapulearningbasedmaliciousdomaindetectionframework
AT songliu pumdapulearningbasedmaliciousdomaindetectionframework
AT yulingliu pumdapulearningbasedmaliciousdomaindetectionframework