Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN
Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detec...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-09-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/9/19/4086 |
_version_ | 1818042688932413440 |
---|---|
author | Yongjun Lee Hyun Kwon Sang-Hoon Choi Seung-Ho Lim Sung Hoon Baek Ki-Woong Park |
author_facet | Yongjun Lee Hyun Kwon Sang-Hoon Choi Seung-Ho Lim Sung Hoon Baek Ki-Woong Park |
author_sort | Yongjun Lee |
collection | DOAJ |
description | Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method—<i>Instruction2vec</i>—an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using <i>Instruction2vec</i>, based on <i>Word2vec</i>; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks—<i>Instruction2vec</i>, <i>Word2vec</i>, and <i>Binary2img</i>—to assess the efficiency of <i>Instruction2vec</i>. We used the <i>Juliet Test Suite</i>, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code. |
first_indexed | 2024-12-10T08:50:18Z |
format | Article |
id | doaj.art-2673aff9d850493faf78130e1c28ed05 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-12-10T08:50:18Z |
publishDate | 2019-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-2673aff9d850493faf78130e1c28ed052022-12-22T01:55:35ZengMDPI AGApplied Sciences2076-34172019-09-01919408610.3390/app9194086app9194086Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNNYongjun Lee0Hyun Kwon1Sang-Hoon Choi2Seung-Ho Lim3Sung Hoon Baek4Ki-Woong Park5Information Security at Graduate School of Information Security, Korea University, Seoul 02841, KoreaSchool of Computing, Korea Advanced Institute of Science and Technology, Daejeon 34141, KoreaDepartment of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, KoreaDivision of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies, Seoul 02450, KoreaDepartment of Computer System Engineering, Jungwon University, Chungcheongbuk-do 28024, KoreaDepartment of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, KoreaPotential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method—<i>Instruction2vec</i>—an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using <i>Instruction2vec</i>, based on <i>Word2vec</i>; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks—<i>Instruction2vec</i>, <i>Word2vec</i>, and <i>Binary2img</i>—to assess the efficiency of <i>Instruction2vec</i>. We used the <i>Juliet Test Suite</i>, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.https://www.mdpi.com/2076-3417/9/19/4086binary analysissoftware weaknessconvolutional neural network<i>word2vec</i> |
spellingShingle | Yongjun Lee Hyun Kwon Sang-Hoon Choi Seung-Ho Lim Sung Hoon Baek Ki-Woong Park Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN Applied Sciences binary analysis software weakness convolutional neural network <i>word2vec</i> |
title | Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN |
title_full | Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN |
title_fullStr | Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN |
title_full_unstemmed | Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN |
title_short | Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN |
title_sort | instruction2vec efficient preprocessor of assembly code to detect software weakness with cnn |
topic | binary analysis software weakness convolutional neural network <i>word2vec</i> |
url | https://www.mdpi.com/2076-3417/9/19/4086 |
work_keys_str_mv | AT yongjunlee instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn AT hyunkwon instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn AT sanghoonchoi instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn AT seungholim instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn AT sunghoonbaek instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn AT kiwoongpark instruction2vecefficientpreprocessorofassemblycodetodetectsoftwareweaknesswithcnn |