BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries
With the undeniable increase in popularity of open source software, also the availability and reuse of source code have increased. While the detection of code clones helps tracking reuse and evolution while dealing with source code, little prior work exists that can be used in binary code. This is c...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9964192/ |
_version_ | 1828171419251376128 |
---|---|
author | Davide Pizzolotto Katsuro Inoue |
author_facet | Davide Pizzolotto Katsuro Inoue |
author_sort | Davide Pizzolotto |
collection | DOAJ |
description | With the undeniable increase in popularity of open source software, also the availability and reuse of source code have increased. While the detection of code clones helps tracking reuse and evolution while dealing with source code, little prior work exists that can be used in binary code. This is complicated by the increased difficulty posed by the compilation transformations. In this paper, we present a CFG refinement useful to find function-level clones in a fast and scalable way by comparing the high-level structure of multiple disassembled binaries altogether. We are capable of determining if functions belonging to other programs have been copied or reused, even when the processor architecture is different. Specifically, our algorithm consists in the extraction of the various functions flows and the reconstruction of a higher level structure, leveraging architectural differences and allowing efficient comparison in linear time with structural hashing. We implemented our idea in a tool called BinCC, and analyzed 24 million functions spanning different architectures and optimization levels. Results show that our approach can achieve precision between 91% and 99% within the same architecture and 75% in detecting clones among different architectures, and can also detect the presence of specific library functions inside an executable. Our approach can reach comparable precision of current state-of-the-art learning approaches while being three order of magnitude faster. |
first_indexed | 2024-04-12T03:27:06Z |
format | Article |
id | doaj.art-4ae01fb0d613479b90cf8ef9f6f2609c |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T03:27:06Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-4ae01fb0d613479b90cf8ef9f6f2609c2022-12-22T03:49:40ZengIEEEIEEE Access2169-35362022-01-011012449112450610.1109/ACCESS.2022.32251009964192BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural BinariesDavide Pizzolotto0https://orcid.org/0000-0002-7690-6592Katsuro Inoue1https://orcid.org/0000-0001-5424-0614Osaka University, Osaka, JapanNanzan University, Nagoya, JapanWith the undeniable increase in popularity of open source software, also the availability and reuse of source code have increased. While the detection of code clones helps tracking reuse and evolution while dealing with source code, little prior work exists that can be used in binary code. This is complicated by the increased difficulty posed by the compilation transformations. In this paper, we present a CFG refinement useful to find function-level clones in a fast and scalable way by comparing the high-level structure of multiple disassembled binaries altogether. We are capable of determining if functions belonging to other programs have been copied or reused, even when the processor architecture is different. Specifically, our algorithm consists in the extraction of the various functions flows and the reconstruction of a higher level structure, leveraging architectural differences and allowing efficient comparison in linear time with structural hashing. We implemented our idea in a tool called BinCC, and analyzed 24 million functions spanning different architectures and optimization levels. Results show that our approach can achieve precision between 91% and 99% within the same architecture and 75% in detecting clones among different architectures, and can also detect the presence of specific library functions inside an executable. Our approach can reach comparable precision of current state-of-the-art learning approaches while being three order of magnitude faster.https://ieeexplore.ieee.org/document/9964192/Code clonesstatic code analysisreverse engineeringcompilers |
spellingShingle | Davide Pizzolotto Katsuro Inoue BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries IEEE Access Code clones static code analysis reverse engineering compilers |
title | BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries |
title_full | BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries |
title_fullStr | BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries |
title_full_unstemmed | BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries |
title_short | BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries |
title_sort | bincc scalable function similarity detection in multiple cross architectural binaries |
topic | Code clones static code analysis reverse engineering compilers |
url | https://ieeexplore.ieee.org/document/9964192/ |
work_keys_str_mv | AT davidepizzolotto binccscalablefunctionsimilaritydetectioninmultiplecrossarchitecturalbinaries AT katsuroinoue binccscalablefunctionsimilaritydetectioninmultiplecrossarchitecturalbinaries |