Malware detection based on semi-supervised learning with malware visualization

The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rap...

Full description

Bibliographic Details
Main Authors:	Tan Gao, Lan Zhao, Xudong Li, Wen Chen
Format:	Article
Language:	English
Published:	AIMS Press 2021-07-01
Series:	Mathematical Biosciences and Engineering
Subjects:	malicious sample detection collaborative learning feature fusion noise robustness
Online Access:	https://www.aimspress.com/article/doi/10.3934/mbe.2021300?viewType=HTML

_version_	1818652066237644800
author	Tan Gao Lan Zhao Xudong Li Wen Chen
author_facet	Tan Gao Lan Zhao Xudong Li Wen Chen
author_sort	Tan Gao
collection	DOAJ
description	The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rapid generation and mutation of malware. Methods based on traditional machine learning often require a lot of time and resources in sample labeling, which results in a sufficient inventory of unlabeled samples but not directly usable. In view of these issues, this paper proposes an effective malware classification framework based on malware visualization and semi-supervised learning. This framework includes mainly three parts: malware visualization, feature extraction, and classification algorithm. Firstly, binary files are processed directly through visual methods, without assembly, decompression, and decryption; Then the global and local features of the gray image are extracted, and the visual image features extracted are fused on the whole by a special feature fusion method to eliminate the exclusion between different feature variables. Finally, an improved collaborative learning algorithm is proposed to continuously train and optimize the classifier by introducing features of inexpensive unlabeled samples. The proposed framework was evaluated over two extensively researched benchmark datasets, i.e., Malimg and Microsoft. The results show that compared with traditional machine learning algorithms, the improved collaborative learning algorithm can not only reduce the cost of sample labeling but also can continuously improve the model performance through the input of unlabeled samples, thereby achieving higher classification accuracy.
first_indexed	2024-12-17T02:16:06Z
format	Article
id	doaj.art-0c87bf4d9e3e4872be29094823d4ce70
institution	Directory Open Access Journal
issn	1551-0018
language	English
last_indexed	2024-12-17T02:16:06Z
publishDate	2021-07-01
publisher	AIMS Press
record_format	Article
series	Mathematical Biosciences and Engineering
spelling	doaj.art-0c87bf4d9e3e4872be29094823d4ce702022-12-21T22:07:24ZengAIMS PressMathematical Biosciences and Engineering1551-00182021-07-011855995601110.3934/mbe.2021300Malware detection based on semi-supervised learning with malware visualizationTan Gao0Lan Zhao1Xudong Li 2Wen Chen 31. School of Cyber Science and Engineering, Sichuan University, China2. Science and Technology on Electronic Information Control Laboratory, China1. School of Cyber Science and Engineering, Sichuan University, China1. School of Cyber Science and Engineering, Sichuan University, ChinaThe traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rapid generation and mutation of malware. Methods based on traditional machine learning often require a lot of time and resources in sample labeling, which results in a sufficient inventory of unlabeled samples but not directly usable. In view of these issues, this paper proposes an effective malware classification framework based on malware visualization and semi-supervised learning. This framework includes mainly three parts: malware visualization, feature extraction, and classification algorithm. Firstly, binary files are processed directly through visual methods, without assembly, decompression, and decryption; Then the global and local features of the gray image are extracted, and the visual image features extracted are fused on the whole by a special feature fusion method to eliminate the exclusion between different feature variables. Finally, an improved collaborative learning algorithm is proposed to continuously train and optimize the classifier by introducing features of inexpensive unlabeled samples. The proposed framework was evaluated over two extensively researched benchmark datasets, i.e., Malimg and Microsoft. The results show that compared with traditional machine learning algorithms, the improved collaborative learning algorithm can not only reduce the cost of sample labeling but also can continuously improve the model performance through the input of unlabeled samples, thereby achieving higher classification accuracy.https://www.aimspress.com/article/doi/10.3934/mbe.2021300?viewType=HTMLmalicious sample detectioncollaborative learningfeature fusionnoise robustness
spellingShingle	Tan Gao Lan Zhao Xudong Li Wen Chen Malware detection based on semi-supervised learning with malware visualization Mathematical Biosciences and Engineering malicious sample detection collaborative learning feature fusion noise robustness
title	Malware detection based on semi-supervised learning with malware visualization
title_full	Malware detection based on semi-supervised learning with malware visualization
title_fullStr	Malware detection based on semi-supervised learning with malware visualization
title_full_unstemmed	Malware detection based on semi-supervised learning with malware visualization
title_short	Malware detection based on semi-supervised learning with malware visualization
title_sort	malware detection based on semi supervised learning with malware visualization
topic	malicious sample detection collaborative learning feature fusion noise robustness
url	https://www.aimspress.com/article/doi/10.3934/mbe.2021300?viewType=HTML
work_keys_str_mv	AT tangao malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization AT lanzhao malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization AT xudongli malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization AT wenchen malwaredetectionbasedonsemisupervisedlearningwithmalwarevisualization

Malware detection based on semi-supervised learning with malware visualization

Similar Items