Scalable code clone detection tool based on semantic analysis

This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part...

Full description

Bibliographic Details
Main Authors: Sevak Sargsyan, Shamil Kurmnagaleev, Andrey Belevantsev, Hayk Aslanyan, Artiom Baloian
Format: Article
Language:English
Published: Ivannikov Institute for System Programming of the Russian Academy of Sciences 2018-10-01
Series:Труды Института системного программирования РАН
Subjects:
Online Access:https://ispranproceedings.elpub.ru/jour/article/view/575
_version_ 1818791960005050368
author Sevak Sargsyan
Shamil Kurmnagaleev
Andrey Belevantsev
Hayk Aslanyan
Artiom Baloian
author_facet Sevak Sargsyan
Shamil Kurmnagaleev
Andrey Belevantsev
Hayk Aslanyan
Artiom Baloian
author_sort Sevak Sargsyan
collection DOAJ
description This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate.  New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one.
first_indexed 2024-12-18T15:19:39Z
format Article
id doaj.art-bcbe63d7559e4b2e8386507d75222ac6
institution Directory Open Access Journal
issn 2079-8156
2220-6426
language English
last_indexed 2024-12-18T15:19:39Z
publishDate 2018-10-01
publisher Ivannikov Institute for System Programming of the Russian Academy of Sciences
record_format Article
series Труды Института системного программирования РАН
spelling doaj.art-bcbe63d7559e4b2e8386507d75222ac62022-12-21T21:03:25ZengIvannikov Institute for System Programming of the Russian Academy of SciencesТруды Института системного программирования РАН2079-81562220-64262018-10-01271395010.15514/ISPRAS-2015-27(1)-3575Scalable code clone detection tool based on semantic analysisSevak Sargsyan0Shamil Kurmnagaleev1Andrey Belevantsev2Hayk Aslanyan3Artiom Baloian4Институт системного программирования РАН, г. МоскваИнститут системного программирования РАН, г. МоскваИнститут системного программирования РАН, г. МоскваИнститут системного программирования РАН, г. МоскваИнститут системного программирования РАН, г. МоскваThis article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate.  New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one.https://ispranproceedings.elpub.ru/jour/article/view/575семантический анализпоиск клоновpdgllvm
spellingShingle Sevak Sargsyan
Shamil Kurmnagaleev
Andrey Belevantsev
Hayk Aslanyan
Artiom Baloian
Scalable code clone detection tool based on semantic analysis
Труды Института системного программирования РАН
семантический анализ
поиск клонов
pdg
llvm
title Scalable code clone detection tool based on semantic analysis
title_full Scalable code clone detection tool based on semantic analysis
title_fullStr Scalable code clone detection tool based on semantic analysis
title_full_unstemmed Scalable code clone detection tool based on semantic analysis
title_short Scalable code clone detection tool based on semantic analysis
title_sort scalable code clone detection tool based on semantic analysis
topic семантический анализ
поиск клонов
pdg
llvm
url https://ispranproceedings.elpub.ru/jour/article/view/575
work_keys_str_mv AT sevaksargsyan scalablecodeclonedetectiontoolbasedonsemanticanalysis
AT shamilkurmnagaleev scalablecodeclonedetectiontoolbasedonsemanticanalysis
AT andreybelevantsev scalablecodeclonedetectiontoolbasedonsemanticanalysis
AT haykaslanyan scalablecodeclonedetectiontoolbasedonsemanticanalysis
AT artiombaloian scalablecodeclonedetectiontoolbasedonsemanticanalysis