Android malware classification using optimum feature selection and ensemble machine learning

The majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has a...

Full description

Bibliographic Details
Main Authors: Rejwana Islam, Moinul Islam Sayed, Sajal Saha, Mohammad Jamal Hossain, Md Abdul Masud
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2023-01-01
Series:Internet of Things and Cyber-Physical Systems
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667345223000202
_version_ 1797742197993373696
author Rejwana Islam
Moinul Islam Sayed
Sajal Saha
Mohammad Jamal Hossain
Md Abdul Masud
author_facet Rejwana Islam
Moinul Islam Sayed
Sajal Saha
Mohammad Jamal Hossain
Md Abdul Masud
author_sort Rejwana Islam
collection DOAJ
description The majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has attracted significant attention compared to traditional detection techniques. Despite academic and commercial efforts, developing an efficient and reliable method for classifying malware remains challenging. As a result, several datasets for malware analysis have been generated and made available during the past ten years. These datasets may contain static features, such as API calls, intents, and permissions, or dynamic features, like logcat errors, shared memory, and system calls. Dynamic analysis is more resilient when it comes to code obfuscation. Though binary classification and multi-classification have been carried out in recent studies, the latter provides valuable insight into the nature of malware. Because each malware variant operates differently, identifying its category might help prevent it. Using the well-known ensemble ML approach called weighted voting, this study performed dynamic feature analysis for multi-classification. Random Forest, K-nearest Neighbors, Multi-Level Perceptrons, Decision Trees, Support Vector Machines, and Logistic Regression are all studied in this ensemble model. We used a recent dataset named CCCS-CIC-AndMal-2020, which contains an extensive collection of Android applications and malware samples. A well-researched data preparation phase followed by weighted voting based on R2 scores of the ML classifiers presents an accuracy of 95.0% even after excluding 60.2% features, outperforming all recent studies.
first_indexed 2024-03-12T14:37:33Z
format Article
id doaj.art-d2ff128763a149f2a2ea07bd0cd7a0d5
institution Directory Open Access Journal
issn 2667-3452
language English
last_indexed 2024-03-12T14:37:33Z
publishDate 2023-01-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Internet of Things and Cyber-Physical Systems
spelling doaj.art-d2ff128763a149f2a2ea07bd0cd7a0d52023-08-17T04:28:05ZengKeAi Communications Co., Ltd.Internet of Things and Cyber-Physical Systems2667-34522023-01-013100111Android malware classification using optimum feature selection and ensemble machine learningRejwana Islam0Moinul Islam Sayed1Sajal Saha2Mohammad Jamal Hossain3Md Abdul Masud4Computer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, Bangladesh; Corresponding author.Computer Science, Western University, London, Ontario, CanadaComputer Science, Western University, London, Ontario, CanadaComputer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, BangladeshComputer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, BangladeshThe majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has attracted significant attention compared to traditional detection techniques. Despite academic and commercial efforts, developing an efficient and reliable method for classifying malware remains challenging. As a result, several datasets for malware analysis have been generated and made available during the past ten years. These datasets may contain static features, such as API calls, intents, and permissions, or dynamic features, like logcat errors, shared memory, and system calls. Dynamic analysis is more resilient when it comes to code obfuscation. Though binary classification and multi-classification have been carried out in recent studies, the latter provides valuable insight into the nature of malware. Because each malware variant operates differently, identifying its category might help prevent it. Using the well-known ensemble ML approach called weighted voting, this study performed dynamic feature analysis for multi-classification. Random Forest, K-nearest Neighbors, Multi-Level Perceptrons, Decision Trees, Support Vector Machines, and Logistic Regression are all studied in this ensemble model. We used a recent dataset named CCCS-CIC-AndMal-2020, which contains an extensive collection of Android applications and malware samples. A well-researched data preparation phase followed by weighted voting based on R2 scores of the ML classifiers presents an accuracy of 95.0% even after excluding 60.2% features, outperforming all recent studies.http://www.sciencedirect.com/science/article/pii/S2667345223000202AndroidMalwareCategory classificationDynamic analysisSupervised MLEnsemble
spellingShingle Rejwana Islam
Moinul Islam Sayed
Sajal Saha
Mohammad Jamal Hossain
Md Abdul Masud
Android malware classification using optimum feature selection and ensemble machine learning
Internet of Things and Cyber-Physical Systems
Android
Malware
Category classification
Dynamic analysis
Supervised ML
Ensemble
title Android malware classification using optimum feature selection and ensemble machine learning
title_full Android malware classification using optimum feature selection and ensemble machine learning
title_fullStr Android malware classification using optimum feature selection and ensemble machine learning
title_full_unstemmed Android malware classification using optimum feature selection and ensemble machine learning
title_short Android malware classification using optimum feature selection and ensemble machine learning
title_sort android malware classification using optimum feature selection and ensemble machine learning
topic Android
Malware
Category classification
Dynamic analysis
Supervised ML
Ensemble
url http://www.sciencedirect.com/science/article/pii/S2667345223000202
work_keys_str_mv AT rejwanaislam androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning
AT moinulislamsayed androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning
AT sajalsaha androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning
AT mohammadjamalhossain androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning
AT mdabdulmasud androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning