Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions

Malware family classification is grouping malware samples that have the same or similar characteristics into the same family. It plays a crucial role in understanding notable malicious patterns and recovering from malware infections. Although many machine learning approaches have been devised for th...

Full description

Bibliographic Details
Main Authors: Minki Kim, Daehan Kim, Changha Hwang, Seongje Cho, Sangchul Han, Minkyu Park
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/21/10244
_version_ 1827678229587034112
author Minki Kim
Daehan Kim
Changha Hwang
Seongje Cho
Sangchul Han
Minkyu Park
author_facet Minki Kim
Daehan Kim
Changha Hwang
Seongje Cho
Sangchul Han
Minkyu Park
author_sort Minki Kim
collection DOAJ
description Malware family classification is grouping malware samples that have the same or similar characteristics into the same family. It plays a crucial role in understanding notable malicious patterns and recovering from malware infections. Although many machine learning approaches have been devised for this problem, there are still several open questions including, “Which features, classifiers, and evaluation metrics are better for malware familial classification”? In this paper, we propose a machine learning approach to Android malware family classification using built-in and custom permissions. Each Android app must declare proper permissions to access restricted resources or to perform restricted actions. Permission declaration is an efficient and obfuscation-resilient feature for malware analysis. We developed a malware family classification technique using permissions and conducted extensive experiments with several classifiers on a well-known dataset, DREBIN. We then evaluated the classifiers in terms of four metrics: macrolevel F1-score, accuracy, balanced accuracy (BAC), and the Matthews correlation coefficient (MCC). BAC and the MCC are known to be appropriate for evaluating imbalanced data classification. Our experimental results showed that: (i) custom permissions had a positive impact on classification performance; (ii) even when the same classifier and the same feature information were used, there was a difference up to 3.67% between accuracy and BAC; (iii) LightGBM and AdaBoost performed better than other classifiers we considered.
first_indexed 2024-03-10T06:06:22Z
format Article
id doaj.art-42e84cfec1774c0aa49eb9db94b74570
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T06:06:22Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-42e84cfec1774c0aa49eb9db94b745702023-11-22T20:30:02ZengMDPI AGApplied Sciences2076-34172021-11-0111211024410.3390/app112110244Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom PermissionsMinki Kim0Daehan Kim1Changha Hwang2Seongje Cho3Sangchul Han4Minkyu Park5Department of Data and Knowledge Service Engineering, Dankook University, Yongin 16890, KoreaDepartment of Data and Knowledge Service Engineering, Dankook University, Yongin 16890, KoreaDepartment of Statistics, Dankook University, Yongin 16890, KoreaDepartment of Software Science, Dankook University, Yongin 16890, KoreaDepartment of Computer Engineering, Konkuk University, Chungju 27478, KoreaDepartment of Computer Engineering, Konkuk University, Chungju 27478, KoreaMalware family classification is grouping malware samples that have the same or similar characteristics into the same family. It plays a crucial role in understanding notable malicious patterns and recovering from malware infections. Although many machine learning approaches have been devised for this problem, there are still several open questions including, “Which features, classifiers, and evaluation metrics are better for malware familial classification”? In this paper, we propose a machine learning approach to Android malware family classification using built-in and custom permissions. Each Android app must declare proper permissions to access restricted resources or to perform restricted actions. Permission declaration is an efficient and obfuscation-resilient feature for malware analysis. We developed a malware family classification technique using permissions and conducted extensive experiments with several classifiers on a well-known dataset, DREBIN. We then evaluated the classifiers in terms of four metrics: macrolevel F1-score, accuracy, balanced accuracy (BAC), and the Matthews correlation coefficient (MCC). BAC and the MCC are known to be appropriate for evaluating imbalanced data classification. Our experimental results showed that: (i) custom permissions had a positive impact on classification performance; (ii) even when the same classifier and the same feature information were used, there was a difference up to 3.67% between accuracy and BAC; (iii) LightGBM and AdaBoost performed better than other classifiers we considered.https://www.mdpi.com/2076-3417/11/21/10244Android malwaremalware family classificationmachine learningbuilt-in permissioncustom permissionbalanced accuracy
spellingShingle Minki Kim
Daehan Kim
Changha Hwang
Seongje Cho
Sangchul Han
Minkyu Park
Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
Applied Sciences
Android malware
malware family classification
machine learning
built-in permission
custom permission
balanced accuracy
title Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
title_full Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
title_fullStr Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
title_full_unstemmed Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
title_short Machine-Learning-Based Android Malware Family Classification Using Built-In and Custom Permissions
title_sort machine learning based android malware family classification using built in and custom permissions
topic Android malware
malware family classification
machine learning
built-in permission
custom permission
balanced accuracy
url https://www.mdpi.com/2076-3417/11/21/10244
work_keys_str_mv AT minkikim machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions
AT daehankim machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions
AT changhahwang machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions
AT seongjecho machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions
AT sangchulhan machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions
AT minkyupark machinelearningbasedandroidmalwarefamilyclassificationusingbuiltinandcustompermissions