Effective DGA-Domain Detection and Classification with TextCNN and Additional Features

Malicious codes, such as advanced persistent threat (APT) attacks, do not operate immediately after infecting the system, but after receiving commands from the attacker’s command and control (C&C) server. The system infected by the malicious code tries to communicate with the C&C server thro...

Full description

Bibliographic Details
Main Authors: Chanwoong Hwang, Hyosik Kim, Hooki Lee, Taejin Lee
Format: Article
Language:English
Published: MDPI AG 2020-06-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/9/7/1070
_version_ 1797563715827007488
author Chanwoong Hwang
Hyosik Kim
Hooki Lee
Taejin Lee
author_facet Chanwoong Hwang
Hyosik Kim
Hooki Lee
Taejin Lee
author_sort Chanwoong Hwang
collection DOAJ
description Malicious codes, such as advanced persistent threat (APT) attacks, do not operate immediately after infecting the system, but after receiving commands from the attacker’s command and control (C&C) server. The system infected by the malicious code tries to communicate with the C&C server through the IP address or domain address of the C&C server. If the IP address or domain address is hard-coded inside the malicious code, it can analyze the malicious code to obtain the address and block access to the C&C server through security policy. In order to circumvent this address blocking technique, domain generation algorithms are included in the malware to dynamically generate domain addresses. The domain generation algorithm (DGA) generates domains randomly, so it is very difficult to identify and block malicious domains. Therefore, this paper effectively detects and classifies unknown DGA domains. We extract features that are effective for TextCNN-based label prediction, and add additional domain knowledge-based features to improve our model for detecting and classifying DGA-generated malicious domains. The proposed model achieved 99.19% accuracy for DGA classification and 88.77% accuracy for DGA class classification. We expect that the proposed model can be applied to effectively detect and block DGA-generated domains.
first_indexed 2024-03-10T18:47:23Z
format Article
id doaj.art-54e47ff6a9a44d8ea98875ef9556329e
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T18:47:23Z
publishDate 2020-06-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-54e47ff6a9a44d8ea98875ef9556329e2023-11-20T05:24:43ZengMDPI AGElectronics2079-92922020-06-0197107010.3390/electronics9071070Effective DGA-Domain Detection and Classification with TextCNN and Additional FeaturesChanwoong Hwang0Hyosik Kim1Hooki Lee2Taejin Lee3Department of Information Security, Hoseo University, Asan 31499, KoreaDepartment of Information Security, Hoseo University, Asan 31499, KoreaDepartment of Cyber Security Engineering, Konyang University, Nonsan 32992, KoreaDepartment of Information Security, Hoseo University, Asan 31499, KoreaMalicious codes, such as advanced persistent threat (APT) attacks, do not operate immediately after infecting the system, but after receiving commands from the attacker’s command and control (C&C) server. The system infected by the malicious code tries to communicate with the C&C server through the IP address or domain address of the C&C server. If the IP address or domain address is hard-coded inside the malicious code, it can analyze the malicious code to obtain the address and block access to the C&C server through security policy. In order to circumvent this address blocking technique, domain generation algorithms are included in the malware to dynamically generate domain addresses. The domain generation algorithm (DGA) generates domains randomly, so it is very difficult to identify and block malicious domains. Therefore, this paper effectively detects and classifies unknown DGA domains. We extract features that are effective for TextCNN-based label prediction, and add additional domain knowledge-based features to improve our model for detecting and classifying DGA-generated malicious domains. The proposed model achieved 99.19% accuracy for DGA classification and 88.77% accuracy for DGA class classification. We expect that the proposed model can be applied to effectively detect and block DGA-generated domains.https://www.mdpi.com/2079-9292/9/7/1070securitydomain generation algorithmTextCNNdomain featuresclassification
spellingShingle Chanwoong Hwang
Hyosik Kim
Hooki Lee
Taejin Lee
Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
Electronics
security
domain generation algorithm
TextCNN
domain features
classification
title Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
title_full Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
title_fullStr Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
title_full_unstemmed Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
title_short Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
title_sort effective dga domain detection and classification with textcnn and additional features
topic security
domain generation algorithm
TextCNN
domain features
classification
url https://www.mdpi.com/2079-9292/9/7/1070
work_keys_str_mv AT chanwoonghwang effectivedgadomaindetectionandclassificationwithtextcnnandadditionalfeatures
AT hyosikkim effectivedgadomaindetectionandclassificationwithtextcnnandadditionalfeatures
AT hookilee effectivedgadomaindetectionandclassificationwithtextcnnandadditionalfeatures
AT taejinlee effectivedgadomaindetectionandclassificationwithtextcnnandadditionalfeatures