Bayesian Hyper-Parameter Optimisation for Malware Detection

Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often...

Full description

Bibliographic Details
Main Authors: Fahad T. ALGorain, John A. Clark
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/10/1640
_version_ 1797500270582693888
author Fahad T. ALGorain
John A. Clark
author_facet Fahad T. ALGorain
John A. Clark
author_sort Fahad T. ALGorain
collection DOAJ
description Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem.
first_indexed 2024-03-10T03:59:30Z
format Article
id doaj.art-eb50a013f73147d6b43f57ce647d15ae
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T03:59:30Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-eb50a013f73147d6b43f57ce647d15ae2023-11-23T10:48:09ZengMDPI AGElectronics2079-92922022-05-011110164010.3390/electronics11101640Bayesian Hyper-Parameter Optimisation for Malware DetectionFahad T. ALGorain0John A. Clark1Department of Computer Science, University of Sheffield, Sheffield S10 2TN, UKDepartment of Computer Science, University of Sheffield, Sheffield S10 2TN, UKMalware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem.https://www.mdpi.com/2079-9292/11/10/1640hyper-parameter optimisationautomated machine learningstatic malware detectiontree parzen estimatorsbayesian optimisationrandom search
spellingShingle Fahad T. ALGorain
John A. Clark
Bayesian Hyper-Parameter Optimisation for Malware Detection
Electronics
hyper-parameter optimisation
automated machine learning
static malware detection
tree parzen estimators
bayesian optimisation
random search
title Bayesian Hyper-Parameter Optimisation for Malware Detection
title_full Bayesian Hyper-Parameter Optimisation for Malware Detection
title_fullStr Bayesian Hyper-Parameter Optimisation for Malware Detection
title_full_unstemmed Bayesian Hyper-Parameter Optimisation for Malware Detection
title_short Bayesian Hyper-Parameter Optimisation for Malware Detection
title_sort bayesian hyper parameter optimisation for malware detection
topic hyper-parameter optimisation
automated machine learning
static malware detection
tree parzen estimators
bayesian optimisation
random search
url https://www.mdpi.com/2079-9292/11/10/1640
work_keys_str_mv AT fahadtalgorain bayesianhyperparameteroptimisationformalwaredetection
AT johnaclark bayesianhyperparameteroptimisationformalwaredetection