<i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network

With the improvement of software copyright protection awareness, code obfuscation technology plays a crucial role in protecting key code segments. As the obfuscation technology becomes more and more complex and diverse, it has spawned a large number of malware variants, which make it easy to evade t...

Full description

Bibliographic Details
Main Authors: Ke Tang, Zheng Shan, Chunyan Zhang, Lianqiu Xu, Meng Qiao, Fudong Liu
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/19/3230
_version_ 1797479749879070720
author Ke Tang
Zheng Shan
Chunyan Zhang
Lianqiu Xu
Meng Qiao
Fudong Liu
author_facet Ke Tang
Zheng Shan
Chunyan Zhang
Lianqiu Xu
Meng Qiao
Fudong Liu
author_sort Ke Tang
collection DOAJ
description With the improvement of software copyright protection awareness, code obfuscation technology plays a crucial role in protecting key code segments. As the obfuscation technology becomes more and more complex and diverse, it has spawned a large number of malware variants, which make it easy to evade the detection of anti-virus software. Malicious code detection mainly depends on binary code similarity analysis. However, the existing software analysis technologies are difficult to deal with the growing complex obfuscation technologies. To solve this problem, this paper proposes a new obfuscation-resilient program analysis method, which is based on the data flow transformation relationship of the intermediate representation and the graph network model. In our approach, we first construct the data transformation graph based on LLVM IR. Then, we design a novel intermediate language representation model based on graph networks, named <i>DFSGraph</i>, to learn the data flow semantics from DTG. <i>DFSGraph</i> can detect the similarity of obfuscated code by extracting the semantic information of program data flow without deobfuscation. Extensive experiments prove that our approach is more accurate than existing deobfuscation tools when searching for similar functions from obfuscated code.
first_indexed 2024-03-09T21:50:17Z
format Article
id doaj.art-d3f7464fa199445eb476e60d89fcd5c4
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T21:50:17Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-d3f7464fa199445eb476e60d89fcd5c42023-11-23T20:08:33ZengMDPI AGElectronics2079-92922022-10-011119323010.3390/electronics11193230<i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph NetworkKe Tang0Zheng Shan1Chunyan Zhang2Lianqiu Xu3Meng Qiao4Fudong Liu5State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, ChinaWith the improvement of software copyright protection awareness, code obfuscation technology plays a crucial role in protecting key code segments. As the obfuscation technology becomes more and more complex and diverse, it has spawned a large number of malware variants, which make it easy to evade the detection of anti-virus software. Malicious code detection mainly depends on binary code similarity analysis. However, the existing software analysis technologies are difficult to deal with the growing complex obfuscation technologies. To solve this problem, this paper proposes a new obfuscation-resilient program analysis method, which is based on the data flow transformation relationship of the intermediate representation and the graph network model. In our approach, we first construct the data transformation graph based on LLVM IR. Then, we design a novel intermediate language representation model based on graph networks, named <i>DFSGraph</i>, to learn the data flow semantics from DTG. <i>DFSGraph</i> can detect the similarity of obfuscated code by extracting the semantic information of program data flow without deobfuscation. Extensive experiments prove that our approach is more accurate than existing deobfuscation tools when searching for similar functions from obfuscated code.https://www.mdpi.com/2079-9292/11/19/3230obfuscationdeobfuscationLLVM IRgraph network
spellingShingle Ke Tang
Zheng Shan
Chunyan Zhang
Lianqiu Xu
Meng Qiao
Fudong Liu
<i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
Electronics
obfuscation
deobfuscation
LLVM IR
graph network
title <i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
title_full <i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
title_fullStr <i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
title_full_unstemmed <i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
title_short <i>DFSGraph</i>: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph Network
title_sort i dfsgraph i data flow semantic model for intermediate representation programs based on graph network
topic obfuscation
deobfuscation
LLVM IR
graph network
url https://www.mdpi.com/2079-9292/11/19/3230
work_keys_str_mv AT ketang idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork
AT zhengshan idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork
AT chunyanzhang idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork
AT lianqiuxu idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork
AT mengqiao idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork
AT fudongliu idfsgraphidataflowsemanticmodelforintermediaterepresentationprogramsbasedongraphnetwork