Malytics: A Malware Detection Scheme

An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation...

Full description

Bibliographic Details
Main Authors: Mahmood Yousefi-Azar, Leonard G. C. Hamey, Vijay Varadharajan, Shiping Chen
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8463441/
_version_ 1818871911042514944
author Mahmood Yousefi-Azar
Leonard G. C. Hamey
Vijay Varadharajan
Shiping Chen
author_facet Mahmood Yousefi-Azar
Leonard G. C. Hamey
Vijay Varadharajan
Shiping Chen
author_sort Mahmood Yousefi-Azar
collection DOAJ
description An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement, and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by <italic>tf</italic>-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97.21&#x0025; and 99.45&#x0025; on Android dex file and Windows PE files, respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.
first_indexed 2024-12-19T12:30:26Z
format Article
id doaj.art-4d4cf8f9973f4076bd53e7cff2519705
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T12:30:26Z
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-4d4cf8f9973f4076bd53e7cff25197052022-12-21T20:21:24ZengIEEEIEEE Access2169-35362018-01-016494184943110.1109/ACCESS.2018.28648718463441Malytics: A Malware Detection SchemeMahmood Yousefi-Azar0https://orcid.org/0000-0002-1029-6584Leonard G. C. Hamey1Vijay Varadharajan2Shiping Chen3Department of Computing, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, AustraliaDepartment of Computing, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, AustraliaFaculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, AustraliaCommonwealth Scientific and Industrial Research Organisation, Data61, Marsfield, NSW, AustraliaAn important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement, and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by <italic>tf</italic>-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97.21&#x0025; and 99.45&#x0025; on Android dex file and Windows PE files, respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.https://ieeexplore.ieee.org/document/8463441/Malware detectionstatic analysisbinary level n-gramsterm frequency shimhashingextreme learning machine
spellingShingle Mahmood Yousefi-Azar
Leonard G. C. Hamey
Vijay Varadharajan
Shiping Chen
Malytics: A Malware Detection Scheme
IEEE Access
Malware detection
static analysis
binary level n-grams
term frequency shimhashing
extreme learning machine
title Malytics: A Malware Detection Scheme
title_full Malytics: A Malware Detection Scheme
title_fullStr Malytics: A Malware Detection Scheme
title_full_unstemmed Malytics: A Malware Detection Scheme
title_short Malytics: A Malware Detection Scheme
title_sort malytics a malware detection scheme
topic Malware detection
static analysis
binary level n-grams
term frequency shimhashing
extreme learning machine
url https://ieeexplore.ieee.org/document/8463441/
work_keys_str_mv AT mahmoodyousefiazar malyticsamalwaredetectionscheme
AT leonardgchamey malyticsamalwaredetectionscheme
AT vijayvaradharajan malyticsamalwaredetectionscheme
AT shipingchen malyticsamalwaredetectionscheme