Detection of Data Scarce Malware Using One-Shot Learning With Relation Network

Malware has evolved to pose a major threat to information security. Efficient anti-malware software is essential in safeguarding confidential information from these threats. However, identifying malware continues to be a challenging task. Signature-based detection methods are quick but fail to detec...

Full description

Bibliographic Details
Main Authors:	Faiza Babar Khan, Muhammad Hanif Durad, Asifullah Khan, Farrukh Aslam Khan, Sajjad Hussain Chauhdary, Mohammed Alqarni
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Data-scarce malware feature embedding meta-learning one-shot learning relation network
Online Access:	https://ieeexplore.ieee.org/document/10175371/

_version_	1827893495844569088
author	Faiza Babar Khan Muhammad Hanif Durad Asifullah Khan Farrukh Aslam Khan Sajjad Hussain Chauhdary Mohammed Alqarni
author_facet	Faiza Babar Khan Muhammad Hanif Durad Asifullah Khan Farrukh Aslam Khan Sajjad Hussain Chauhdary Mohammed Alqarni
author_sort	Faiza Babar Khan
collection	DOAJ
description	Malware has evolved to pose a major threat to information security. Efficient anti-malware software is essential in safeguarding confidential information from these threats. However, identifying malware continues to be a challenging task. Signature-based detection methods are quick but fail to detect unknown malware. Additionally, the traditional machine learning archetype requires a large amount of data to be effective, which hinders the ability of an anti-malware system to quickly learn about new threats with limited training samples. In a real-world setting, the majority of malware is found in the form of Portable Executable (PE) files. While there are various formats of PE files, samples of all formats such as ocx, acm, com, scr, etc., are not readily available in large numbers. Therefore, building a conventional Machine Learning (ML) model with greater generalization for data-scarce PE formats becomes a hefty task. Consequently, in such a scenario, Few-Shot learning (FSL) is helpful in detecting the presence of malware, even with a very small number of training samples. FSL techniques help to make predictions based on an insufficient number of samples. In this paper, we propose a novel architecture based on the Relation Network for FSL implementation. We propose a Discriminative Feature Embedder for feature extraction. These extracted features are passed to our proposed Relation Module (RM) for similarity measure. RM produces the relation scores that lead to improved classification. We use PE file formats, i.e., ocx, acm, com, and scr, after transforming them into images. We employ five-shot learning and then one-shot learning, which produces 94% accuracy with only one training instance. We observe that the proposed architecture outpaces the baseline method and provides enhanced accuracy by up to 94% with only one sample.
first_indexed	2024-03-12T21:53:36Z
format	Article
id	doaj.art-e19f3578fb004f8ca6e4c9ae3b075f8f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T21:53:36Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-e19f3578fb004f8ca6e4c9ae3b075f8f2023-07-25T23:01:02ZengIEEEIEEE Access2169-35362023-01-0111744387445710.1109/ACCESS.2023.329311710175371Detection of Data Scarce Malware Using One-Shot Learning With Relation NetworkFaiza Babar Khan0https://orcid.org/0000-0002-6751-8360Muhammad Hanif Durad1https://orcid.org/0000-0002-8026-1045Asifullah Khan2https://orcid.org/0000-0003-2039-5305Farrukh Aslam Khan3https://orcid.org/0000-0002-7023-7172Sajjad Hussain Chauhdary4https://orcid.org/0000-0001-8552-5786Mohammed Alqarni5https://orcid.org/0000-0002-3284-537XCIPMA Laboratory, DCIS, Pakistan Institute of Engineering and Applied Sciences, Islamabad, PakistanCIPMA Laboratory, DCIS, Pakistan Institute of Engineering and Applied Sciences, Islamabad, PakistanPattern Recognition Laboratory, DCIS, PIEAS, Nilore, Islamabad, PakistanCenter of Excellence in Information Assurance (CoEIA), King Saud University, Riyadh, Saudi ArabiaDepartment of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi ArabiaDepartment of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi ArabiaMalware has evolved to pose a major threat to information security. Efficient anti-malware software is essential in safeguarding confidential information from these threats. However, identifying malware continues to be a challenging task. Signature-based detection methods are quick but fail to detect unknown malware. Additionally, the traditional machine learning archetype requires a large amount of data to be effective, which hinders the ability of an anti-malware system to quickly learn about new threats with limited training samples. In a real-world setting, the majority of malware is found in the form of Portable Executable (PE) files. While there are various formats of PE files, samples of all formats such as ocx, acm, com, scr, etc., are not readily available in large numbers. Therefore, building a conventional Machine Learning (ML) model with greater generalization for data-scarce PE formats becomes a hefty task. Consequently, in such a scenario, Few-Shot learning (FSL) is helpful in detecting the presence of malware, even with a very small number of training samples. FSL techniques help to make predictions based on an insufficient number of samples. In this paper, we propose a novel architecture based on the Relation Network for FSL implementation. We propose a Discriminative Feature Embedder for feature extraction. These extracted features are passed to our proposed Relation Module (RM) for similarity measure. RM produces the relation scores that lead to improved classification. We use PE file formats, i.e., ocx, acm, com, and scr, after transforming them into images. We employ five-shot learning and then one-shot learning, which produces 94% accuracy with only one training instance. We observe that the proposed architecture outpaces the baseline method and provides enhanced accuracy by up to 94% with only one sample.https://ieeexplore.ieee.org/document/10175371/Data-scarce malwarefeature embeddingmeta-learningone-shot learningrelation network
spellingShingle	Faiza Babar Khan Muhammad Hanif Durad Asifullah Khan Farrukh Aslam Khan Sajjad Hussain Chauhdary Mohammed Alqarni Detection of Data Scarce Malware Using One-Shot Learning With Relation Network IEEE Access Data-scarce malware feature embedding meta-learning one-shot learning relation network
title	Detection of Data Scarce Malware Using One-Shot Learning With Relation Network
title_full	Detection of Data Scarce Malware Using One-Shot Learning With Relation Network
title_fullStr	Detection of Data Scarce Malware Using One-Shot Learning With Relation Network
title_full_unstemmed	Detection of Data Scarce Malware Using One-Shot Learning With Relation Network
title_short	Detection of Data Scarce Malware Using One-Shot Learning With Relation Network
title_sort	detection of data scarce malware using one shot learning with relation network
topic	Data-scarce malware feature embedding meta-learning one-shot learning relation network
url	https://ieeexplore.ieee.org/document/10175371/
work_keys_str_mv	AT faizababarkhan detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork AT muhammadhanifdurad detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork AT asifullahkhan detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork AT farrukhaslamkhan detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork AT sajjadhussainchauhdary detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork AT mohammedalqarni detectionofdatascarcemalwareusingoneshotlearningwithrelationnetwork

Detection of Data Scarce Malware Using One-Shot Learning With Relation Network

Similar Items