SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks

Vulnerability prediction, in which static analysis is leveraged to predict the vulnerabilities of binary programs, has become a popular research topic. Traditional vulnerability prediction methods depend on vulnerability patterns, which must be predefined by security experts in a time-consuming mann...

Full description

Bibliographic Details
Main Authors:	Xu Zhou, Bingjie Duan, Xugang Wu, Pengfei Wang
Format:	Article
Language:	English
Published:	MDPI AG 2023-02-01
Series:	Applied Sciences
Subjects:	vulnerability prediction binary program neural networks software security
Online Access:	https://www.mdpi.com/2076-3417/13/4/2271

_version_	1827758806203891712
author	Xu Zhou Bingjie Duan Xugang Wu Pengfei Wang
author_facet	Xu Zhou Bingjie Duan Xugang Wu Pengfei Wang
author_sort	Xu Zhou
collection	DOAJ
description	Vulnerability prediction, in which static analysis is leveraged to predict the vulnerabilities of binary programs, has become a popular research topic. Traditional vulnerability prediction methods depend on vulnerability patterns, which must be predefined by security experts in a time-consuming manner. The development of Artificial Intelligence (AI) has yielded new options for vulnerability prediction. Neural networks allow vulnerability patterns to be learned automatically. However, current works extract only one or two types of features and use traditional models such as word2vec, which results in the loss of much instruction-level information. In this paper, we propose a model named <i>SAViP</i> to predict vulnerabilities in binary programs. To fully extract binary information, we integrate three kinds of features: semantic, statistical, and structural features. For semantic features, we apply the Masked Language Model (MLM) pre-training task of the RoBERTa model to the assembly code to build our language model. Using this model, we innovatively combine the beginning token and the operation-code token to create the instruction embedding. For the statistical features, we design a 56-dimensional feature vector that contains 43 kinds of instructions. For the structural features, we improve the ability of the structure2vec network to obtain the characteristic of the network by emphasizing node self-attention. Through these optimizations, we significantly increase the accuracy of vulnerability prediction over existing methods. Our experiments show that <i>SAViP</i> achieves a recall of 77.85% and Top 100∼600 accuracies all above 95%. The results are 10% and 13% higher than those of the state-of-the-art V-Fuzz, respectively.
first_indexed	2024-03-11T09:12:41Z
format	Article
id	doaj.art-1d6e2c0c54a74c23ab8e909a190c1ba2
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T09:12:41Z
publishDate	2023-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-1d6e2c0c54a74c23ab8e909a190c1ba22023-11-16T18:53:21ZengMDPI AGApplied Sciences2076-34172023-02-01134227110.3390/app13042271SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural NetworksXu Zhou0Bingjie Duan1Xugang Wu2Pengfei Wang3College of Computer, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer, National University of Defense Technology, Changsha 410073, ChinaVulnerability prediction, in which static analysis is leveraged to predict the vulnerabilities of binary programs, has become a popular research topic. Traditional vulnerability prediction methods depend on vulnerability patterns, which must be predefined by security experts in a time-consuming manner. The development of Artificial Intelligence (AI) has yielded new options for vulnerability prediction. Neural networks allow vulnerability patterns to be learned automatically. However, current works extract only one or two types of features and use traditional models such as word2vec, which results in the loss of much instruction-level information. In this paper, we propose a model named <i>SAViP</i> to predict vulnerabilities in binary programs. To fully extract binary information, we integrate three kinds of features: semantic, statistical, and structural features. For semantic features, we apply the Masked Language Model (MLM) pre-training task of the RoBERTa model to the assembly code to build our language model. Using this model, we innovatively combine the beginning token and the operation-code token to create the instruction embedding. For the statistical features, we design a 56-dimensional feature vector that contains 43 kinds of instructions. For the structural features, we improve the ability of the structure2vec network to obtain the characteristic of the network by emphasizing node self-attention. Through these optimizations, we significantly increase the accuracy of vulnerability prediction over existing methods. Our experiments show that <i>SAViP</i> achieves a recall of 77.85% and Top 100∼600 accuracies all above 95%. The results are 10% and 13% higher than those of the state-of-the-art V-Fuzz, respectively.https://www.mdpi.com/2076-3417/13/4/2271vulnerability predictionbinary programneural networkssoftware security
spellingShingle	Xu Zhou Bingjie Duan Xugang Wu Pengfei Wang SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks Applied Sciences vulnerability prediction binary program neural networks software security
title	SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks
title_full	SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks
title_fullStr	SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks
title_full_unstemmed	SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks
title_short	SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks
title_sort	savip semantic aware vulnerability prediction for binary programs with neural networks
topic	vulnerability prediction binary program neural networks software security
url	https://www.mdpi.com/2076-3417/13/4/2271
work_keys_str_mv	AT xuzhou savipsemanticawarevulnerabilitypredictionforbinaryprogramswithneuralnetworks AT bingjieduan savipsemanticawarevulnerabilitypredictionforbinaryprogramswithneuralnetworks AT xugangwu savipsemanticawarevulnerabilitypredictionforbinaryprogramswithneuralnetworks AT pengfeiwang savipsemanticawarevulnerabilitypredictionforbinaryprogramswithneuralnetworks

SAViP: Semantic-Aware Vulnerability Prediction for Binary Programs with Neural Networks

Similar Items