Dynamic Malware Analysis Based on API Sequence Semantic Fusion
The existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the fil...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/11/6526 |
_version_ | 1797597893631148032 |
---|---|
author | Sanfeng Zhang Jiahao Wu Mengzhe Zhang Wang Yang |
author_facet | Sanfeng Zhang Jiahao Wu Mengzhe Zhang Wang Yang |
author_sort | Sanfeng Zhang |
collection | DOAJ |
description | The existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the file system, or other factors. Additionally, the detection performance is limited when the size of the API call sequence is too large. To address this issue, we propose Mal-ASSF, a novel malware detection model that fuses the semantic and sequence features of the API calls. The API2Vec embedding method is used to obtain the dimensionality reduction representation of the API function. To capture the behavioral features of sequential segments, Balts is used to extract the features. To leverage the implicit semantic information of the API functions, the operation and the type of resource operated by the API functions are extracted. These semantic and sequential features are then fused and processed by the attention-related modules. In comparison with the existing methods, Mal-ASSF boasts superior capabilities in terms of semantic representation and recognition of critical sequences within API call sequences. According to the evaluation with a dataset of malware families, the experimental results show that Mal-ASSF outperforms existing solutions by 3% to 5% in detection accuracy. |
first_indexed | 2024-03-11T03:11:49Z |
format | Article |
id | doaj.art-c64fba66987e42628b4aaaf4fee74c00 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T03:11:49Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-c64fba66987e42628b4aaaf4fee74c002023-11-18T07:33:19ZengMDPI AGApplied Sciences2076-34172023-05-011311652610.3390/app13116526Dynamic Malware Analysis Based on API Sequence Semantic FusionSanfeng Zhang0Jiahao Wu1Mengzhe Zhang2Wang Yang3School of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaSchool of Cyber Science and Engineering, Southeast University, Nanjing 211189, ChinaThe existing dynamic malware detection methods based on API call sequences ignore the semantic information of functions. Simply mapping API to numerical values does not reflect whether a function has performed a query or modification operation, whether it is related to network communication, the file system, or other factors. Additionally, the detection performance is limited when the size of the API call sequence is too large. To address this issue, we propose Mal-ASSF, a novel malware detection model that fuses the semantic and sequence features of the API calls. The API2Vec embedding method is used to obtain the dimensionality reduction representation of the API function. To capture the behavioral features of sequential segments, Balts is used to extract the features. To leverage the implicit semantic information of the API functions, the operation and the type of resource operated by the API functions are extracted. These semantic and sequential features are then fused and processed by the attention-related modules. In comparison with the existing methods, Mal-ASSF boasts superior capabilities in terms of semantic representation and recognition of critical sequences within API call sequences. According to the evaluation with a dataset of malware families, the experimental results show that Mal-ASSF outperforms existing solutions by 3% to 5% in detection accuracy.https://www.mdpi.com/2076-3417/13/11/6526malwaredynamic analysisAPI call sequencesemantic featurefusion |
spellingShingle | Sanfeng Zhang Jiahao Wu Mengzhe Zhang Wang Yang Dynamic Malware Analysis Based on API Sequence Semantic Fusion Applied Sciences malware dynamic analysis API call sequence semantic feature fusion |
title | Dynamic Malware Analysis Based on API Sequence Semantic Fusion |
title_full | Dynamic Malware Analysis Based on API Sequence Semantic Fusion |
title_fullStr | Dynamic Malware Analysis Based on API Sequence Semantic Fusion |
title_full_unstemmed | Dynamic Malware Analysis Based on API Sequence Semantic Fusion |
title_short | Dynamic Malware Analysis Based on API Sequence Semantic Fusion |
title_sort | dynamic malware analysis based on api sequence semantic fusion |
topic | malware dynamic analysis API call sequence semantic feature fusion |
url | https://www.mdpi.com/2076-3417/13/11/6526 |
work_keys_str_mv | AT sanfengzhang dynamicmalwareanalysisbasedonapisequencesemanticfusion AT jiahaowu dynamicmalwareanalysisbasedonapisequencesemanticfusion AT mengzhezhang dynamicmalwareanalysisbasedonapisequencesemanticfusion AT wangyang dynamicmalwareanalysisbasedonapisequencesemanticfusion |