Dynamic Malware Analysis Based on API Sequence Semantic Fusion

The existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the fil...

Full description

Bibliographic Details
Main Authors: Sanfeng Zhang, Jiahao Wu, Mengzhe Zhang, Wang Yang
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/11/6526
_version_ 1797597893631148032
author Sanfeng Zhang
Jiahao Wu
Mengzhe Zhang
Wang Yang
author_facet Sanfeng Zhang
Jiahao Wu
Mengzhe Zhang
Wang Yang
author_sort Sanfeng Zhang
collection DOAJ
description The existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the file system, or other factors. Additionally, the detection performance is limited when the size of the API call sequence is too large. To address this issue, we propose Mal-ASSF, a novel malware detection model that fuses the semantic and sequence features of the API calls. The API2Vec embedding method is used to obtain the dimensionality reduction representation of the API function. To capture the behavioral features of sequential segments, Balts is used to extract the features. To leverage the implicit semantic information of the API functions, the operation and the type of resource operated by the API functions are extracted. These semantic and sequential features are then fused and processed by the attention-related modules. In comparison with the existing methods, Mal-ASSF boasts superior capabilities in terms of semantic representation and recognition of critical sequences within API call sequences. According to the evaluation with a dataset of malware families, the experimental results show that Mal-ASSF outperforms existing solutions by 3% to 5% in detection accuracy.
first_indexed 2024-03-11T03:11:49Z
format Article
id doaj.art-c64fba66987e42628b4aaaf4fee74c00
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T03:11:49Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-c64fba66987e42628b4aaaf4fee74c002023-11-18T07:33:19ZengMDPI AGApplied Sciences2076-34172023-05-011311652610.3390/app13116526Dynamic Malware Analysis Based on API Sequence Semantic FusionSanfeng Zhang0Jiahao Wu1Mengzhe Zhang2Wang Yang3School of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaThe existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the file system, or other factors. Additionally, the detection performance is limited when the size of the API call sequence is too large. To address this issue, we propose Mal-ASSF, a novel malware detection model that fuses the semantic and sequence features of the API calls. The API2Vec embedding method is used to obtain the dimensionality reduction representation of the API function. To capture the behavioral features of sequential segments, Balts is used to extract the features. To leverage the implicit semantic information of the API functions, the operation and the type of resource operated by the API functions are extracted. These semantic and sequential features are then fused and processed by the attention-related modules. In comparison with the existing methods, Mal-ASSF boasts superior capabilities in terms of semantic representation and recognition of critical sequences within API call sequences. According to the evaluation with a dataset of malware families, the experimental results show that Mal-ASSF outperforms existing solutions by 3% to 5% in detection accuracy.https://www.mdpi.com/2076-3417/13/11/6526malwaredynamic analysisAPI call sequencesemantic featurefusion
spellingShingle Sanfeng Zhang
Jiahao Wu
Mengzhe Zhang
Wang Yang
Dynamic Malware Analysis Based on API Sequence Semantic Fusion
Applied Sciences
malware
dynamic analysis
API call sequence
semantic feature
fusion
title Dynamic Malware Analysis Based on API Sequence Semantic Fusion
title_full Dynamic Malware Analysis Based on API Sequence Semantic Fusion
title_fullStr Dynamic Malware Analysis Based on API Sequence Semantic Fusion
title_full_unstemmed Dynamic Malware Analysis Based on API Sequence Semantic Fusion
title_short Dynamic Malware Analysis Based on API Sequence Semantic Fusion
title_sort dynamic malware analysis based on api sequence semantic fusion
topic malware
dynamic analysis
API call sequence
semantic feature
fusion
url https://www.mdpi.com/2076-3417/13/11/6526
work_keys_str_mv AT sanfengzhang dynamicmalwareanalysisbasedonapisequencesemanticfusion
AT jiahaowu dynamicmalwareanalysisbasedonapisequencesemanticfusion
AT mengzhezhang dynamicmalwareanalysisbasedonapisequencesemanticfusion
AT wangyang dynamicmalwareanalysisbasedonapisequencesemanticfusion