Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detec...

Full description

Bibliographic Details
Main Authors: Yongjun Lee, Hyun Kwon, Sang-Hoon Choi, Seung-Ho Lim, Sung Hoon Baek, Ki-Woong Park
Format: Article
Language:English
Published: MDPI AG 2019-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/19/4086
_version_ 1818042688932413440
author Yongjun Lee
Hyun Kwon
Sang-Hoon Choi
Seung-Ho Lim
Sung Hoon Baek
Ki-Woong Park
author_facet Yongjun Lee
Hyun Kwon
Sang-Hoon Choi
Seung-Ho Lim
Sung Hoon Baek
Ki-Woong Park
author_sort Yongjun Lee
collection DOAJ
description Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method&#8212;<i>Instruction2vec</i>&#8212;an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using <i>Instruction2vec</i>, based on <i>Word2vec</i>; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks&#8212;<i>Instruction2vec</i>, <i>Word2vec</i>, and <i>Binary2img</i>&#8212;to assess the efficiency of <i>Instruction2vec</i>. We used the <i>Juliet Test Suite</i>, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.
first_indexed 2024-12-10T08:50:18Z
format Article
id doaj.art-2673aff9d850493faf78130e1c28ed05
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-10T08:50:18Z
publishDate 2019-09-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-2673aff9d850493faf78130e1c28ed052022-12-22T01:55:35ZengMDPI AGApplied Sciences2076-34172019-09-01919408610.3390/app9194086app9194086Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNNYongjun Lee0Hyun Kwon1Sang-Hoon Choi2Seung-Ho Lim3Sung Hoon Baek4Ki-Woong Park5Information Security at Graduate School of Information Security, Korea University, Seoul 02841, KoreaSchool of Computing, Korea Advanced Institute of Science and Technology, Daejeon 34141, KoreaDepartment of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, KoreaDivision of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies, Seoul 02450, KoreaDepartment of Computer System Engineering, Jungwon University, Chungcheongbuk-do 28024, KoreaDepartment of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, KoreaPotential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method&#8212;<i>Instruction2vec</i>&#8212;an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using <i>Instruction2vec</i>, based on <i>Word2vec</i>; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks&#8212;<i>Instruction2vec</i>, <i>Word2vec</i>, and <i>Binary2img</i>&#8212;to assess the efficiency of <i>Instruction2vec</i>. We used the <i>Juliet Test Suite</i>, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.https://www.mdpi.com/2076-3417/9/19/4086binary analysissoftware weaknessconvolutional neural network<i>word2vec</i>
spellingShingle Yongjun Lee
Hyun Kwon
Sang-Hoon Choi
Seung-Ho Lim
Sung Hoon Baek
Ki-Woong Park
Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
Applied Sciences
binary analysis
software weakness
convolutional neural network
<i>word2vec</i>
title Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
title_full Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
title_fullStr Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
title_full_unstemmed Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
title_short Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
title_sort instruction2vec efficient preprocessor of assembly code to detect software weakness with cnn
topic binary analysis
software weakness
convolutional neural network
<i>word2vec</i>
url https://www.mdpi.com/2076-3417/9/19/4086
work_keys_str_mv AT yongjunlee instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn
AT hyunkwon instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn
AT sanghoonchoi instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn
AT seungholim instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn
AT sunghoonbaek instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn
AT kiwoongpark instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn