A Closer Look at Machine Learning Effectiveness in Android Malware Detection

Nowadays, with the increasing usage of Android devices in daily life activities, malware has been increasing rapidly, putting peoples’ security and privacy at risk. To mitigate this threat, several researchers have proposed different methods to detect Android malware. Recently, machine learning base...

Full description

Bibliographic Details
Main Authors: Filippos Giannakas, Vasileios Kouliaridis, Georgios Kambourakis
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/14/1/2
Description
Summary:Nowadays, with the increasing usage of Android devices in daily life activities, malware has been increasing rapidly, putting peoples’ security and privacy at risk. To mitigate this threat, several researchers have proposed different methods to detect Android malware. Recently, machine learning based models have been explored by a significant mass of researchers checking for Android malware. However, selecting the most appropriate model is not straightforward, since there are several aspects that must be considered. Contributing to this domain, the current paper explores Android malware detection from diverse perspectives; this is achieved by optimizing and evaluating various machine learning algorithms. Specifically, we conducted an experiment for training, optimizing, and evaluating 27 machine learning algorithms, and a Deep Neural Network (DNN). During the optimization phase, we performed hyperparameter analysis using the Optuna framework. The evaluation phase includes the measurement of different performance metrics against a contemporary, rich dataset, to conclude with the most accurate model. The best model was further interpreted by conducting feature analysis, using the Shapley Additive Explanations (SHAP) framework. Our experiment results showed that the best model is the DNN consisting of four layers (two hidden), using the Adamax optimizer, as well as the Binary Cross-Entropy (loss), and the Softsign activation functions. The model succeeded with 86% prediction accuracy, while the balanced accuracy, the F1-score, and the ROC-AUC metrics were at 82%.
ISSN:2078-2489