Android malware classification using optimum feature selection and ensemble machine learning
The majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has a...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co., Ltd.
2023-01-01
|
Series: | Internet of Things and Cyber-Physical Systems |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2667345223000202 |
_version_ | 1797742197993373696 |
---|---|
author | Rejwana Islam Moinul Islam Sayed Sajal Saha Mohammad Jamal Hossain Md Abdul Masud |
author_facet | Rejwana Islam Moinul Islam Sayed Sajal Saha Mohammad Jamal Hossain Md Abdul Masud |
author_sort | Rejwana Islam |
collection | DOAJ |
description | The majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has attracted significant attention compared to traditional detection techniques. Despite academic and commercial efforts, developing an efficient and reliable method for classifying malware remains challenging. As a result, several datasets for malware analysis have been generated and made available during the past ten years. These datasets may contain static features, such as API calls, intents, and permissions, or dynamic features, like logcat errors, shared memory, and system calls. Dynamic analysis is more resilient when it comes to code obfuscation. Though binary classification and multi-classification have been carried out in recent studies, the latter provides valuable insight into the nature of malware. Because each malware variant operates differently, identifying its category might help prevent it. Using the well-known ensemble ML approach called weighted voting, this study performed dynamic feature analysis for multi-classification. Random Forest, K-nearest Neighbors, Multi-Level Perceptrons, Decision Trees, Support Vector Machines, and Logistic Regression are all studied in this ensemble model. We used a recent dataset named CCCS-CIC-AndMal-2020, which contains an extensive collection of Android applications and malware samples. A well-researched data preparation phase followed by weighted voting based on R2 scores of the ML classifiers presents an accuracy of 95.0% even after excluding 60.2% features, outperforming all recent studies. |
first_indexed | 2024-03-12T14:37:33Z |
format | Article |
id | doaj.art-d2ff128763a149f2a2ea07bd0cd7a0d5 |
institution | Directory Open Access Journal |
issn | 2667-3452 |
language | English |
last_indexed | 2024-03-12T14:37:33Z |
publishDate | 2023-01-01 |
publisher | KeAi Communications Co., Ltd. |
record_format | Article |
series | Internet of Things and Cyber-Physical Systems |
spelling | doaj.art-d2ff128763a149f2a2ea07bd0cd7a0d52023-08-17T04:28:05ZengKeAi Communications Co., Ltd.Internet of Things and Cyber-Physical Systems2667-34522023-01-013100111Android malware classification using optimum feature selection and ensemble machine learningRejwana Islam0Moinul Islam Sayed1Sajal Saha2Mohammad Jamal Hossain3Md Abdul Masud4Computer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, Bangladesh; Corresponding author.Computer Science, Western University, London, Ontario, CanadaComputer Science, Western University, London, Ontario, CanadaComputer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, BangladeshComputer Science and Information Technology, Patuakhali Science and Technology University, Dumki, Patuakhali, BangladeshThe majority of smartphones on the market run on the Android operating system. Security has been a core concern with this platform since it allows users to install apps from unknown sources. With thousands of apps being produced and launched daily, malware detection using Machine Learning (ML) has attracted significant attention compared to traditional detection techniques. Despite academic and commercial efforts, developing an efficient and reliable method for classifying malware remains challenging. As a result, several datasets for malware analysis have been generated and made available during the past ten years. These datasets may contain static features, such as API calls, intents, and permissions, or dynamic features, like logcat errors, shared memory, and system calls. Dynamic analysis is more resilient when it comes to code obfuscation. Though binary classification and multi-classification have been carried out in recent studies, the latter provides valuable insight into the nature of malware. Because each malware variant operates differently, identifying its category might help prevent it. Using the well-known ensemble ML approach called weighted voting, this study performed dynamic feature analysis for multi-classification. Random Forest, K-nearest Neighbors, Multi-Level Perceptrons, Decision Trees, Support Vector Machines, and Logistic Regression are all studied in this ensemble model. We used a recent dataset named CCCS-CIC-AndMal-2020, which contains an extensive collection of Android applications and malware samples. A well-researched data preparation phase followed by weighted voting based on R2 scores of the ML classifiers presents an accuracy of 95.0% even after excluding 60.2% features, outperforming all recent studies.http://www.sciencedirect.com/science/article/pii/S2667345223000202AndroidMalwareCategory classificationDynamic analysisSupervised MLEnsemble |
spellingShingle | Rejwana Islam Moinul Islam Sayed Sajal Saha Mohammad Jamal Hossain Md Abdul Masud Android malware classification using optimum feature selection and ensemble machine learning Internet of Things and Cyber-Physical Systems Android Malware Category classification Dynamic analysis Supervised ML Ensemble |
title | Android malware classification using optimum feature selection and ensemble machine learning |
title_full | Android malware classification using optimum feature selection and ensemble machine learning |
title_fullStr | Android malware classification using optimum feature selection and ensemble machine learning |
title_full_unstemmed | Android malware classification using optimum feature selection and ensemble machine learning |
title_short | Android malware classification using optimum feature selection and ensemble machine learning |
title_sort | android malware classification using optimum feature selection and ensemble machine learning |
topic | Android Malware Category classification Dynamic analysis Supervised ML Ensemble |
url | http://www.sciencedirect.com/science/article/pii/S2667345223000202 |
work_keys_str_mv | AT rejwanaislam androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning AT moinulislamsayed androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning AT sajalsaha androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning AT mohammadjamalhossain androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning AT mdabdulmasud androidmalwareclassificationusingoptimumfeatureselectionandensemblemachinelearning |