Incorporating known malware signatures to classify new malware variants in network traffic

Content-based malware classification technique using n-gram features required high computational overhead because of the size of feature space. This paper proposes the augmentation of domain knowledge in the form of known Snort malware signatures to machine learning techniques to reduce resources (i...

Full description

Bibliographic Details
Main Authors: Ismail, Ismahani, Marsono, Muhammad Nadzir, Khammas, Ban Mohammed, Mohd. Nor, Sulaiman
Format: Article
Published: John Wiley and Sons 2015
Subjects:
_version_ 1796860228207116288
author Ismail, Ismahani
Marsono, Muhammad Nadzir
Khammas, Ban Mohammed
Mohd. Nor, Sulaiman
author_facet Ismail, Ismahani
Marsono, Muhammad Nadzir
Khammas, Ban Mohammed
Mohd. Nor, Sulaiman
author_sort Ismail, Ismahani
collection ePrints
description Content-based malware classification technique using n-gram features required high computational overhead because of the size of feature space. This paper proposes the augmentation of domain knowledge in the form of known Snort malware signatures to machine learning techniques to reduce resources (in terms of the time to generate machine learning model and the memory usage to store generative model). Although current malware can be encrypted or mutated, these malware still exhibit prevalent contents or payloads as their predecessors. Using a dataset of traffic captured from a campus network, our approach is able to reduce initial generated million n-gram features to only around 90000 features, which significantly reduces processing time to generate naive Bayes model by 95%. The generated model that has been trained by the most descriptive features (4-gram Snort signatures with high information gain) produces lower false negative, about 2% compared with other models. Moreover, the proposed method is capable of detecting 10 new malware variants with 0% false negative. The findings from this paper can be the basis for improving malware classification based on content classification to detect known and new malware
first_indexed 2024-03-05T19:38:48Z
format Article
id utm.eprints-55819
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T19:38:48Z
publishDate 2015
publisher John Wiley and Sons
record_format dspace
spelling utm.eprints-558192017-02-15T01:08:09Z http://eprints.utm.my/55819/ Incorporating known malware signatures to classify new malware variants in network traffic Ismail, Ismahani Marsono, Muhammad Nadzir Khammas, Ban Mohammed Mohd. Nor, Sulaiman TK Electrical engineering. Electronics Nuclear engineering Content-based malware classification technique using n-gram features required high computational overhead because of the size of feature space. This paper proposes the augmentation of domain knowledge in the form of known Snort malware signatures to machine learning techniques to reduce resources (in terms of the time to generate machine learning model and the memory usage to store generative model). Although current malware can be encrypted or mutated, these malware still exhibit prevalent contents or payloads as their predecessors. Using a dataset of traffic captured from a campus network, our approach is able to reduce initial generated million n-gram features to only around 90000 features, which significantly reduces processing time to generate naive Bayes model by 95%. The generated model that has been trained by the most descriptive features (4-gram Snort signatures with high information gain) produces lower false negative, about 2% compared with other models. Moreover, the proposed method is capable of detecting 10 new malware variants with 0% false negative. The findings from this paper can be the basis for improving malware classification based on content classification to detect known and new malware John Wiley and Sons 2015-11 Article PeerReviewed Ismail, Ismahani and Marsono, Muhammad Nadzir and Khammas, Ban Mohammed and Mohd. Nor, Sulaiman (2015) Incorporating known malware signatures to classify new malware variants in network traffic. International Journal of Network Management, 25 (6). pp. 471-489. ISSN 1055-7148 http://dx.doi.org/10.1002/nem.1913 DOI:10.1002/nem.1913
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Ismail, Ismahani
Marsono, Muhammad Nadzir
Khammas, Ban Mohammed
Mohd. Nor, Sulaiman
Incorporating known malware signatures to classify new malware variants in network traffic
title Incorporating known malware signatures to classify new malware variants in network traffic
title_full Incorporating known malware signatures to classify new malware variants in network traffic
title_fullStr Incorporating known malware signatures to classify new malware variants in network traffic
title_full_unstemmed Incorporating known malware signatures to classify new malware variants in network traffic
title_short Incorporating known malware signatures to classify new malware variants in network traffic
title_sort incorporating known malware signatures to classify new malware variants in network traffic
topic TK Electrical engineering. Electronics Nuclear engineering
work_keys_str_mv AT ismailismahani incorporatingknownmalwaresignaturestoclassifynewmalwarevariantsinnetworktraffic
AT marsonomuhammadnadzir incorporatingknownmalwaresignaturestoclassifynewmalwarevariantsinnetworktraffic
AT khammasbanmohammed incorporatingknownmalwaresignaturestoclassifynewmalwarevariantsinnetworktraffic
AT mohdnorsulaiman incorporatingknownmalwaresignaturestoclassifynewmalwarevariantsinnetworktraffic