Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names

Malware detection is a problem that has become particularly challenging over the last decade. A common strategy for detecting malware is to scan network traffic for malicious connections between infected devices and their command and control (C&C) servers. However, malware developers are...

Full description

Bibliographic Details
Main Authors:	Jose Selvi, Ricardo J. Rodriguez, Emilio Soria-Olivas
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Deep learning LSTM malware domain generation algorithms
Online Access:	https://ieeexplore.ieee.org/document/9531932/

_version_	1818596122044661760
author	Jose Selvi Ricardo J. Rodriguez Emilio Soria-Olivas
author_facet	Jose Selvi Ricardo J. Rodriguez Emilio Soria-Olivas
author_sort	Jose Selvi
collection	DOAJ
description	Malware detection is a problem that has become particularly challenging over the last decade. A common strategy for detecting malware is to scan network traffic for malicious connections between infected devices and their command and control (C&C) servers. However, malware developers are aware of this detection method and begin to incorporate new strategies to go unnoticed. In particular, they generate domain names instead of using static Internet Protocol addresses or regular domain names pointing to their C&C servers. By using a domain generation algorithm, the effectiveness of the blacklisting of domains is reduced, as the large number of domain names that must be blocked greatly increases the size of the blacklist. In this paper, we study different Long Short-Term Memory neural network hyperparameters to find the best network configuration for algorithmically generated domain name detection. In particular, we focus on determining whether the (complex) feature engineering efforts required when using other deep learning techniques, such as Random Forest, can be avoided. In this regard, we have conducted a comparative analysis to study the effect of using different network sizes and configurations on network performance metrics. Our results show an accuracy of 97.62% and an area under the receiver operating characteristic curve of 0.9956 in the test dataset, indicating that it is possible to obtain good classification results despite avoiding the feature engineering process and additional readjustments required in other machine learning techniques.
first_indexed	2024-12-16T11:26:53Z
format	Article
id	doaj.art-78ff8ccfba1d4befb5e80ea7652d1d2d
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T11:26:53Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-78ff8ccfba1d4befb5e80ea7652d1d2d2022-12-21T22:33:19ZengIEEEIEEE Access2169-35362021-01-01912644612645610.1109/ACCESS.2021.31113079531932Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain NamesJose Selvi0https://orcid.org/0000-0001-7078-112XRicardo J. Rodriguez1https://orcid.org/0000-0001-7982-0359Emilio Soria-Olivas2Department of Electronic Engineering, Intelligent Data Analysis Laboratory (IDAL), ETSE, University of Valencia, Valencia, SpainDepartment of Computer Science and Systems Engineering, University of Zaragoza, Zaragoza, SpainDepartment of Electronic Engineering, Intelligent Data Analysis Laboratory (IDAL), ETSE, University of Valencia, Valencia, SpainMalware detection is a problem that has become particularly challenging over the last decade. A common strategy for detecting malware is to scan network traffic for malicious connections between infected devices and their command and control (C&C) servers. However, malware developers are aware of this detection method and begin to incorporate new strategies to go unnoticed. In particular, they generate domain names instead of using static Internet Protocol addresses or regular domain names pointing to their C&C servers. By using a domain generation algorithm, the effectiveness of the blacklisting of domains is reduced, as the large number of domain names that must be blocked greatly increases the size of the blacklist. In this paper, we study different Long Short-Term Memory neural network hyperparameters to find the best network configuration for algorithmically generated domain name detection. In particular, we focus on determining whether the (complex) feature engineering efforts required when using other deep learning techniques, such as Random Forest, can be avoided. In this regard, we have conducted a comparative analysis to study the effect of using different network sizes and configurations on network performance metrics. Our results show an accuracy of 97.62% and an area under the receiver operating characteristic curve of 0.9956 in the test dataset, indicating that it is possible to obtain good classification results despite avoiding the feature engineering process and additional readjustments required in other machine learning techniques.https://ieeexplore.ieee.org/document/9531932/Deep learningLSTMmalwaredomain generation algorithms
spellingShingle	Jose Selvi Ricardo J. Rodriguez Emilio Soria-Olivas Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names IEEE Access Deep learning LSTM malware domain generation algorithms
title	Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
title_full	Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
title_fullStr	Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
title_full_unstemmed	Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
title_short	Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
title_sort	toward optimal lstm neural networks for detecting algorithmically generated domain names
topic	Deep learning LSTM malware domain generation algorithms
url	https://ieeexplore.ieee.org/document/9531932/
work_keys_str_mv	AT joseselvi towardoptimallstmneuralnetworksfordetectingalgorithmicallygenerateddomainnames AT ricardojrodriguez towardoptimallstmneuralnetworksfordetectingalgorithmicallygenerateddomainnames AT emiliosoriaolivas towardoptimallstmneuralnetworksfordetectingalgorithmicallygenerateddomainnames

Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names

Similar Items