Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network

In recent years, the number of smart contracts running in the blockchain has increased rapidly, accompanied by many security problems, such as vulnerability propagation caused by code reuse or vicious transaction caused by malicious contract deployment, for example. Most smart contracts do not publi...

Full description

Bibliographic Details
Main Authors: Di Zhu, Feng Yue, Jianmin Pang, Xin Zhou, Wenjie Han, Fudong Liu
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/4/597
_version_ 1797480869876727808
author Di Zhu
Feng Yue
Jianmin Pang
Xin Zhou
Wenjie Han
Fudong Liu
author_facet Di Zhu
Feng Yue
Jianmin Pang
Xin Zhou
Wenjie Han
Fudong Liu
author_sort Di Zhu
collection DOAJ
description In recent years, the number of smart contracts running in the blockchain has increased rapidly, accompanied by many security problems, such as vulnerability propagation caused by code reuse or vicious transaction caused by malicious contract deployment, for example. Most smart contracts do not publish the source code, but only the bytecode. Based on the research of bytecode similarity of smart contract, smart contract upgrade, vulnerability search and malicious contract analysis can be carried out. The difficulty of bytecode similarity research is that different compilation versions and optimization options lead to the diversification of bytecode of the same source code. This paper presents a solution, including a series of methods to measure the similarity of smart contract bytecode. Starting from the opcode of smart contract, a method of pre-training the basic block sequence of smart contract is proposed, which can embed the basic block vector. Positive samples were obtained by basic block marking, and the negative sampling method is improved. After these works, we put the obtained positive samples, negative samples and basic blocks themselves into the triplet network composed of transformers. Our solution can obtain evaluation results with an accuracy of 97.8%, so that the basic block sequence of optimized and unoptimized options can be transformed into each other. At the same time, the instructions are normalized, and the order of compiled version instructions is normalized. Experiments show that our solution can effectively reduce the bytecode difference caused by optimization options and compiler version, and improve the accuracy by 1.4% compared with the existing work. We provide a data set covering 64 currently used Solidity compilers, including one million basic block pairs extracted from them.
first_indexed 2024-03-09T22:06:24Z
format Article
id doaj.art-442d63aad4d84d488060cddea423c2a5
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T22:06:24Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-442d63aad4d84d488060cddea423c2a52023-11-23T19:39:56ZengMDPI AGElectronics2079-92922022-02-0111459710.3390/electronics11040597Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet NetworkDi Zhu0Feng Yue1Jianmin Pang2Xin Zhou3Wenjie Han4Fudong Liu5State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaIn recent years, the number of smart contracts running in the blockchain has increased rapidly, accompanied by many security problems, such as vulnerability propagation caused by code reuse or vicious transaction caused by malicious contract deployment, for example. Most smart contracts do not publish the source code, but only the bytecode. Based on the research of bytecode similarity of smart contract, smart contract upgrade, vulnerability search and malicious contract analysis can be carried out. The difficulty of bytecode similarity research is that different compilation versions and optimization options lead to the diversification of bytecode of the same source code. This paper presents a solution, including a series of methods to measure the similarity of smart contract bytecode. Starting from the opcode of smart contract, a method of pre-training the basic block sequence of smart contract is proposed, which can embed the basic block vector. Positive samples were obtained by basic block marking, and the negative sampling method is improved. After these works, we put the obtained positive samples, negative samples and basic blocks themselves into the triplet network composed of transformers. Our solution can obtain evaluation results with an accuracy of 97.8%, so that the basic block sequence of optimized and unoptimized options can be transformed into each other. At the same time, the instructions are normalized, and the order of compiled version instructions is normalized. Experiments show that our solution can effectively reduce the bytecode difference caused by optimization options and compiler version, and improve the accuracy by 1.4% compared with the existing work. We provide a data set covering 64 currently used Solidity compilers, including one million basic block pairs extracted from them.https://www.mdpi.com/2079-9292/11/4/597smart contractbytecode similaritybasic blocktriplet network
spellingShingle Di Zhu
Feng Yue
Jianmin Pang
Xin Zhou
Wenjie Han
Fudong Liu
Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
Electronics
smart contract
bytecode similarity
basic block
triplet network
title Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
title_full Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
title_fullStr Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
title_full_unstemmed Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
title_short Bytecode Similarity Detection of Smart Contract across Optimization Options and Compiler Versions Based on Triplet Network
title_sort bytecode similarity detection of smart contract across optimization options and compiler versions based on triplet network
topic smart contract
bytecode similarity
basic block
triplet network
url https://www.mdpi.com/2079-9292/11/4/597
work_keys_str_mv AT dizhu bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork
AT fengyue bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork
AT jianminpang bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork
AT xinzhou bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork
AT wenjiehan bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork
AT fudongliu bytecodesimilaritydetectionofsmartcontractacrossoptimizationoptionsandcompilerversionsbasedontripletnetwork