A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection

Attackers usually use a command and control (C2) server to manipulate the communication. In order to perform an attack, threat actors often employ a domain generation algorithm (DGA), which can allow malware to communicate with C2 by generating a variety of network locations. Traditional malware con...

Full description

Bibliographic Details
Main Authors:	Yi Li, Kaiqi Xiong, Tommy Chin, Chengbin Hu
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Malware domain generation algorithm machine learning security networking
Online Access:	https://ieeexplore.ieee.org/document/8631171/

_version_	1831527720648966144
author	Yi Li Kaiqi Xiong Tommy Chin Chengbin Hu
author_facet	Yi Li Kaiqi Xiong Tommy Chin Chengbin Hu
author_sort	Yi Li
collection	DOAJ
description	Attackers usually use a command and control (C2) server to manipulate the communication. In order to perform an attack, threat actors often employ a domain generation algorithm (DGA), which can allow malware to communicate with C2 by generating a variety of network locations. Traditional malware control methods, such as blacklisting, are insufficient to handle DGA threats. In this paper, we propose a machine learning framework for identifying and detecting DGA domains to alleviate the threat. We collect real-time threat data from the real-life traffic over a one-year period. We also propose a deep learning model to classify a large number of DGA domains. The proposed machine learning framework consists of a two-level model and a prediction model. In the two-level model, we first classify the DGA domains apart from normal domains and then use the clustering method to identify the algorithms that generate those DGA domains. In the prediction model, a time-series model is constructed to predict incoming domain features based on the hidden Markov model (HMM). Furthermore, we build a deep neural network (DNN) model to enhance the proposed machine learning framework by handling the huge dataset we gradually collected. Our extensive experimental results demonstrate the accuracy of the proposed framework and the DNN model. To be precise, we achieve an accuracy of 95.89% for the classification in the framework and 97.79% in the DNN model, 92.45% for the second-level clustering, and 95.21% for the HMM prediction in the framework.
first_indexed	2024-12-16T17:20:23Z
format	Article
id	doaj.art-1f62b2bedd334e778534a787b8c58cc8
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T17:20:23Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-1f62b2bedd334e778534a787b8c58cc82022-12-21T22:23:10ZengIEEEIEEE Access2169-35362019-01-017327653278210.1109/ACCESS.2019.28915888631171A Machine Learning Framework for Domain Generation Algorithm-Based Malware DetectionYi Li0Kaiqi Xiong1https://orcid.org/0000-0003-2933-8083Tommy Chin2https://orcid.org/0000-0003-0446-1325Chengbin Hu3Intelligent Computer Networking and Security Lab, Florida Center for Cybersecurity, University of South Florida, Tampa, FL, USAIntelligent Computer Networking and Security Lab, Florida Center for Cybersecurity, University of South Florida, Tampa, FL, USADepartment of Computing Security, Rochester Institute of Technology, Rochester, NY, USAIntelligent Computer Networking and Security Lab, Florida Center for Cybersecurity, University of South Florida, Tampa, FL, USAAttackers usually use a command and control (C2) server to manipulate the communication. In order to perform an attack, threat actors often employ a domain generation algorithm (DGA), which can allow malware to communicate with C2 by generating a variety of network locations. Traditional malware control methods, such as blacklisting, are insufficient to handle DGA threats. In this paper, we propose a machine learning framework for identifying and detecting DGA domains to alleviate the threat. We collect real-time threat data from the real-life traffic over a one-year period. We also propose a deep learning model to classify a large number of DGA domains. The proposed machine learning framework consists of a two-level model and a prediction model. In the two-level model, we first classify the DGA domains apart from normal domains and then use the clustering method to identify the algorithms that generate those DGA domains. In the prediction model, a time-series model is constructed to predict incoming domain features based on the hidden Markov model (HMM). Furthermore, we build a deep neural network (DNN) model to enhance the proposed machine learning framework by handling the huge dataset we gradually collected. Our extensive experimental results demonstrate the accuracy of the proposed framework and the DNN model. To be precise, we achieve an accuracy of 95.89% for the classification in the framework and 97.79% in the DNN model, 92.45% for the second-level clustering, and 95.21% for the HMM prediction in the framework.https://ieeexplore.ieee.org/document/8631171/Malwaredomain generation algorithmmachine learningsecuritynetworking
spellingShingle	Yi Li Kaiqi Xiong Tommy Chin Chengbin Hu A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection IEEE Access Malware domain generation algorithm machine learning security networking
title	A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
title_full	A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
title_fullStr	A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
title_full_unstemmed	A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
title_short	A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
title_sort	machine learning framework for domain generation algorithm based malware detection
topic	Malware domain generation algorithm machine learning security networking
url	https://ieeexplore.ieee.org/document/8631171/
work_keys_str_mv	AT yili amachinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT kaiqixiong amachinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT tommychin amachinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT chengbinhu amachinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT yili machinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT kaiqixiong machinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT tommychin machinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection AT chengbinhu machinelearningframeworkfordomaingenerationalgorithmbasedmalwaredetection

A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection

Similar Items