Bayesian Hyper-Parameter Optimisation for Malware Detection
Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/10/1640 |
_version_ | 1797500270582693888 |
---|---|
author | Fahad T. ALGorain John A. Clark |
author_facet | Fahad T. ALGorain John A. Clark |
author_sort | Fahad T. ALGorain |
collection | DOAJ |
description | Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem. |
first_indexed | 2024-03-10T03:59:30Z |
format | Article |
id | doaj.art-eb50a013f73147d6b43f57ce647d15ae |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T03:59:30Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-eb50a013f73147d6b43f57ce647d15ae2023-11-23T10:48:09ZengMDPI AGElectronics2079-92922022-05-011110164010.3390/electronics11101640Bayesian Hyper-Parameter Optimisation for Malware DetectionFahad T. ALGorain0John A. Clark1Department of Computer Science, University of Sheffield, Sheffield S10 2TN, UKDepartment of Computer Science, University of Sheffield, Sheffield S10 2TN, UKMalware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem.https://www.mdpi.com/2079-9292/11/10/1640hyper-parameter optimisationautomated machine learningstatic malware detectiontree parzen estimatorsbayesian optimisationrandom search |
spellingShingle | Fahad T. ALGorain John A. Clark Bayesian Hyper-Parameter Optimisation for Malware Detection Electronics hyper-parameter optimisation automated machine learning static malware detection tree parzen estimators bayesian optimisation random search |
title | Bayesian Hyper-Parameter Optimisation for Malware Detection |
title_full | Bayesian Hyper-Parameter Optimisation for Malware Detection |
title_fullStr | Bayesian Hyper-Parameter Optimisation for Malware Detection |
title_full_unstemmed | Bayesian Hyper-Parameter Optimisation for Malware Detection |
title_short | Bayesian Hyper-Parameter Optimisation for Malware Detection |
title_sort | bayesian hyper parameter optimisation for malware detection |
topic | hyper-parameter optimisation automated machine learning static malware detection tree parzen estimators bayesian optimisation random search |
url | https://www.mdpi.com/2079-9292/11/10/1640 |
work_keys_str_mv | AT fahadtalgorain bayesianhyperparameteroptimisationformalwaredetection AT johnaclark bayesianhyperparameteroptimisationformalwaredetection |