Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection

With more than three million applications already in the Android marketplace, various malware detection systems based on machine learning have been proposed to prevent attacks from cybercriminals; most of these systems use static analyses to extract application features. However, many features gener...

Full description

Bibliographic Details
Main Authors: Yun-Chung Chen, Hong-Yen Chen, Takeshi Takahashi, Bo Sun, Tsung-Nan Lin
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9529344/
_version_ 1818647710764367872
author Yun-Chung Chen
Hong-Yen Chen
Takeshi Takahashi
Bo Sun
Tsung-Nan Lin
author_facet Yun-Chung Chen
Hong-Yen Chen
Takeshi Takahashi
Bo Sun
Tsung-Nan Lin
author_sort Yun-Chung Chen
collection DOAJ
description With more than three million applications already in the Android marketplace, various malware detection systems based on machine learning have been proposed to prevent attacks from cybercriminals; most of these systems use static analyses to extract application features. However, many features generated by static analyses can be easily thwarted by obfuscation techniques. Therefore, several researchers have addressed this obfuscation problem with obfuscation-invariant features. However, to the best of our knowledge, no researcher has utilized deobfuscation techniques. To this end, we adopt a code deobfuscation technique with an Android malware detection system and investigate its effects. Experimental results indicate that code deobfuscation can successfully retrieve useful information concealed by obfuscation. Further, we propose interaction terms based on identified feature interactions. The proposed interaction terms aim to eliminate the interference caused by the size of the application and other features because many feature values are correlated to the size of the application. In addition, the experimental results indicate that these interaction terms have a high ranking in terms of feature importance values. Our proposed Android malware detection model achieves 99.55% accuracy and a 94.61% F1-score with the well-known Drebin dataset, which is better than the performance of previous works.
first_indexed 2024-12-17T01:06:52Z
format Article
id doaj.art-5badca29484f45d6a833f0755e01f54f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T01:06:52Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-5badca29484f45d6a833f0755e01f54f2022-12-21T22:09:14ZengIEEEIEEE Access2169-35362021-01-01912320812321910.1109/ACCESS.2021.31104089529344Impact of Code Deobfuscation and Feature Interaction in Android Malware DetectionYun-Chung Chen0https://orcid.org/0000-0003-4207-5695Hong-Yen Chen1https://orcid.org/0000-0001-5638-8030Takeshi Takahashi2https://orcid.org/0000-0002-6477-7770Bo Sun3https://orcid.org/0000-0002-7822-3672Tsung-Nan Lin4https://orcid.org/0000-0001-5659-1194Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, TaiwanGraduate Institute of Communication Engineering, National Taiwan University, Taipei, TaiwanNational Institute of Information and Communications Technology, Koganei, Tokyo, JapanNational Institute of Information and Communications Technology, Koganei, Tokyo, JapanDepartment of Electrical Engineering, National Taiwan University, Taipei, TaiwanWith more than three million applications already in the Android marketplace, various malware detection systems based on machine learning have been proposed to prevent attacks from cybercriminals; most of these systems use static analyses to extract application features. However, many features generated by static analyses can be easily thwarted by obfuscation techniques. Therefore, several researchers have addressed this obfuscation problem with obfuscation-invariant features. However, to the best of our knowledge, no researcher has utilized deobfuscation techniques. To this end, we adopt a code deobfuscation technique with an Android malware detection system and investigate its effects. Experimental results indicate that code deobfuscation can successfully retrieve useful information concealed by obfuscation. Further, we propose interaction terms based on identified feature interactions. The proposed interaction terms aim to eliminate the interference caused by the size of the application and other features because many feature values are correlated to the size of the application. In addition, the experimental results indicate that these interaction terms have a high ranking in terms of feature importance values. Our proposed Android malware detection model achieves 99.55% accuracy and a 94.61% F1-score with the well-known Drebin dataset, which is better than the performance of previous works.https://ieeexplore.ieee.org/document/9529344/Android malware detectionclassificationcode deobfuscationfeature interactionmachine learningstatic analysis
spellingShingle Yun-Chung Chen
Hong-Yen Chen
Takeshi Takahashi
Bo Sun
Tsung-Nan Lin
Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
IEEE Access
Android malware detection
classification
code deobfuscation
feature interaction
machine learning
static analysis
title Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
title_full Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
title_fullStr Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
title_full_unstemmed Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
title_short Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
title_sort impact of code deobfuscation and feature interaction in android malware detection
topic Android malware detection
classification
code deobfuscation
feature interaction
machine learning
static analysis
url https://ieeexplore.ieee.org/document/9529344/
work_keys_str_mv AT yunchungchen impactofcodedeobfuscationandfeatureinteractioninandroidmalwaredetection
AT hongyenchen impactofcodedeobfuscationandfeatureinteractioninandroidmalwaredetection
AT takeshitakahashi impactofcodedeobfuscationandfeatureinteractioninandroidmalwaredetection
AT bosun impactofcodedeobfuscationandfeatureinteractioninandroidmalwaredetection
AT tsungnanlin impactofcodedeobfuscationandfeatureinteractioninandroidmalwaredetection