Android Malware Classification Based on Fuzzy Hashing Visualization

The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural ne...

Full description

Bibliographic Details
Main Authors: Horacio Rodriguez-Bazan, Grigori Sidorov, Ponciano Jorge Escamilla-Ambrosio
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/5/4/88
_version_ 1797380301070008320
author Horacio Rodriguez-Bazan
Grigori Sidorov
Ponciano Jorge Escamilla-Ambrosio
author_facet Horacio Rodriguez-Bazan
Grigori Sidorov
Ponciano Jorge Escamilla-Ambrosio
author_sort Horacio Rodriguez-Bazan
collection DOAJ
description The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network for malware classification using images. The research presents a novel approach to transforming the Android Application Package (APK) into a grayscale image. The image creation utilizes natural language processing techniques for text cleaning, extraction, and fuzzy hashing to represent the decompiled code from the APK in a set of hashes after preprocessing, where the image is composed of <i>n</i> fuzzy hashes that represent an APK. The method was tested on an Android malware dataset with 15,493 samples of five malware types. The proposed method showed an increase in accuracy compared to others in the literature, achieving up to 98.24% in the classification task.
first_indexed 2024-03-08T20:35:24Z
format Article
id doaj.art-f455497ed1744a429305bedcafcd9f09
institution Directory Open Access Journal
issn 2504-4990
language English
last_indexed 2024-03-08T20:35:24Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj.art-f455497ed1744a429305bedcafcd9f092023-12-22T14:22:13ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902023-11-01541826184710.3390/make5040088Android Malware Classification Based on Fuzzy Hashing VisualizationHoracio Rodriguez-Bazan0Grigori Sidorov1Ponciano Jorge Escamilla-Ambrosio2Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Av. Juan de Dios Batiz, s/n, Mexico City 07320, MexicoCentro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Av. Juan de Dios Batiz, s/n, Mexico City 07320, MexicoCentro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Av. Juan de Dios Batiz, s/n, Mexico City 07320, MexicoThe proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network for malware classification using images. The research presents a novel approach to transforming the Android Application Package (APK) into a grayscale image. The image creation utilizes natural language processing techniques for text cleaning, extraction, and fuzzy hashing to represent the decompiled code from the APK in a set of hashes after preprocessing, where the image is composed of <i>n</i> fuzzy hashes that represent an APK. The method was tested on an Android malware dataset with 15,493 samples of five malware types. The proposed method showed an increase in accuracy compared to others in the literature, achieving up to 98.24% in the classification task.https://www.mdpi.com/2504-4990/5/4/88android malwareconvolutional neural networkdeep learningfuzzy hashingmalware classificationnatural language processing
spellingShingle Horacio Rodriguez-Bazan
Grigori Sidorov
Ponciano Jorge Escamilla-Ambrosio
Android Malware Classification Based on Fuzzy Hashing Visualization
Machine Learning and Knowledge Extraction
android malware
convolutional neural network
deep learning
fuzzy hashing
malware classification
natural language processing
title Android Malware Classification Based on Fuzzy Hashing Visualization
title_full Android Malware Classification Based on Fuzzy Hashing Visualization
title_fullStr Android Malware Classification Based on Fuzzy Hashing Visualization
title_full_unstemmed Android Malware Classification Based on Fuzzy Hashing Visualization
title_short Android Malware Classification Based on Fuzzy Hashing Visualization
title_sort android malware classification based on fuzzy hashing visualization
topic android malware
convolutional neural network
deep learning
fuzzy hashing
malware classification
natural language processing
url https://www.mdpi.com/2504-4990/5/4/88
work_keys_str_mv AT horaciorodriguezbazan androidmalwareclassificationbasedonfuzzyhashingvisualization
AT grigorisidorov androidmalwareclassificationbasedonfuzzyhashingvisualization
AT poncianojorgeescamillaambrosio androidmalwareclassificationbasedonfuzzyhashingvisualization