A Malware Detection Framework Based on Semantic Information of Behavioral Features

As the amount of malware has grown rapidly in recent years, it has become the most dominant attack method in network security. Learning execution behavior, especially Application Programming Interface (API) call sequences, has been shown to be effective for malware detection. However, it is troubles...

Повний опис

Бібліографічні деталі
Автори: Yuxin Zhang, Shumian Yang, Lijuan Xu, Xin Li, Dawei Zhao
Формат: Стаття
Мова:English
Опубліковано: MDPI AG 2023-11-01
Серія:Applied Sciences
Предмети:
Онлайн доступ:https://www.mdpi.com/2076-3417/13/22/12528
_version_ 1827640661793308672
author Yuxin Zhang
Shumian Yang
Lijuan Xu
Xin Li
Dawei Zhao
author_facet Yuxin Zhang
Shumian Yang
Lijuan Xu
Xin Li
Dawei Zhao
author_sort Yuxin Zhang
collection DOAJ
description As the amount of malware has grown rapidly in recent years, it has become the most dominant attack method in network security. Learning execution behavior, especially Application Programming Interface (API) call sequences, has been shown to be effective for malware detection. However, it is troublesome in practice to adequate mining of API call features. Among the current research methods, most of them only analyze single features or inadequately analyze the features, ignoring the analysis of structural and semantic features, which results in information loss and thus affects the accuracy. In order to deal with the problems mentioned above, we propose a novel method of malware detection based on semantic information of behavioral features. First, we preprocess the sequence of API function calls to reduce redundant information. Then, we obtain a vectorized representation of the API call sequence by word embedding model, and encode the API call name by analyzing it to characterize the API name’s semantic structure information and statistical information. Finally, a malware detector consisting of CNN and bidirectional GRU, which can better understand the local and global features between API calls, is used for detection. We evaluate the proposed model in a publicly available dataset provided by a third party. The experimental results show that the proposed method outperforms the baseline method. With this combined neural network architecture, our proposed model attains detection accuracy of 0.9828 and an F1-Score of 0.9827.
first_indexed 2024-03-09T17:02:45Z
format Article
id doaj.art-a87c6078c23a4a638f97ce3bd5fcd139
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T17:02:45Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-a87c6078c23a4a638f97ce3bd5fcd1392023-11-24T14:28:18ZengMDPI AGApplied Sciences2076-34172023-11-0113221252810.3390/app132212528A Malware Detection Framework Based on Semantic Information of Behavioral FeaturesYuxin Zhang0Shumian Yang1Lijuan Xu2Xin Li3Dawei Zhao4Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, ChinaKey Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, ChinaKey Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, ChinaKey Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, ChinaKey Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, ChinaAs the amount of malware has grown rapidly in recent years, it has become the most dominant attack method in network security. Learning execution behavior, especially Application Programming Interface (API) call sequences, has been shown to be effective for malware detection. However, it is troublesome in practice to adequate mining of API call features. Among the current research methods, most of them only analyze single features or inadequately analyze the features, ignoring the analysis of structural and semantic features, which results in information loss and thus affects the accuracy. In order to deal with the problems mentioned above, we propose a novel method of malware detection based on semantic information of behavioral features. First, we preprocess the sequence of API function calls to reduce redundant information. Then, we obtain a vectorized representation of the API call sequence by word embedding model, and encode the API call name by analyzing it to characterize the API name’s semantic structure information and statistical information. Finally, a malware detector consisting of CNN and bidirectional GRU, which can better understand the local and global features between API calls, is used for detection. We evaluate the proposed model in a publicly available dataset provided by a third party. The experimental results show that the proposed method outperforms the baseline method. With this combined neural network architecture, our proposed model attains detection accuracy of 0.9828 and an F1-Score of 0.9827.https://www.mdpi.com/2076-3417/13/22/12528network securitydynamic analysisAPI sequencesdeep learningmalware detection
spellingShingle Yuxin Zhang
Shumian Yang
Lijuan Xu
Xin Li
Dawei Zhao
A Malware Detection Framework Based on Semantic Information of Behavioral Features
Applied Sciences
network security
dynamic analysis
API sequences
deep learning
malware detection
title A Malware Detection Framework Based on Semantic Information of Behavioral Features
title_full A Malware Detection Framework Based on Semantic Information of Behavioral Features
title_fullStr A Malware Detection Framework Based on Semantic Information of Behavioral Features
title_full_unstemmed A Malware Detection Framework Based on Semantic Information of Behavioral Features
title_short A Malware Detection Framework Based on Semantic Information of Behavioral Features
title_sort malware detection framework based on semantic information of behavioral features
topic network security
dynamic analysis
API sequences
deep learning
malware detection
url https://www.mdpi.com/2076-3417/13/22/12528
work_keys_str_mv AT yuxinzhang amalwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT shumianyang amalwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT lijuanxu amalwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT xinli amalwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT daweizhao amalwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT yuxinzhang malwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT shumianyang malwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT lijuanxu malwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT xinli malwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures
AT daweizhao malwaredetectionframeworkbasedonsemanticinformationofbehavioralfeatures