Frequency analysis and online learning in malware detection

Traditional antivirus products are signature-based solutions, which rely on a static database to perform detection. The weakness of this design is that the signatures may become outdated, resulting in the failure to detect new samples. The other method is behavior-based detection, which aims to iden...

Full description

Bibliographic Details
Main Author: Huynh, Ngoc Anh
Other Authors: Ng Wee Keong
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/93574
http://hdl.handle.net/10220/49944
Description
Summary:Traditional antivirus products are signature-based solutions, which rely on a static database to perform detection. The weakness of this design is that the signatures may become outdated, resulting in the failure to detect new samples. The other method is behavior-based detection, which aims to identify malware based on their dynamic behavior. Behavior-based detection comes in two approaches. The first approach leverages on common known behaviors of malware such as random domain name generation and periodicity. The second approach aims to directly learn the behavior of malware from data using tools such as graph analytics and machine learning. Behavior-based detection is di cult because we have to deal with intelligent and highly motivated attackers, who can change their strategy to maximize the chance of getting access to computer networks. We narrow our research to the domain of Windows malware detection and we are particularly interested in two approaches of behavior-based detection: periodic behavior and behavior evolution. Periodic behavior refers to the regular activities programmed by attackers such as periodic polling for server connection or periodic update of the victim machine's status. Behavior evolution refers to the change in behavior of malware over time. In the first approach, we aim to exploit the periodic behavior for malware detection. The main analysis tool in this direction is Fourier transform, which is used to convert time-domain signals into frequency domain signals. This idea is motivated by the fact that it is often easier to analyze periodic signals in the frequency domain than in the original time domain. Using Fourier transform, we propose a novel frequency-based periodicity measure to evaluate the regularity of network traffic. Another challenge in this direction is that, other than malware, most automatic services of operating systems also generate periodic signals. To address this challenge, we propose a new visual analytics solution for effective alert verification. In the second approach, we aim to develop adaptive learning algorithms to capture malware samples, whose behavior changes over time. We capitalize on the well-known online machine learning framework of Follow the Regularized Leader (FTRL). Our main contribution in this direction is the usage of an adaptive decaying factor to allow FTRL algorithms to better perform in environments with concept drifts. The decaying factor helps to increasingly discount the contribution of the examples in the past, thereby alleviating the problem of concept drifts. We advance the state of the art in this direction by proposing a new adaptive online algorithm to handle the problem of concept drift in malware detection. Our improved algorithm has also been successfully applied to other non-security domains.