Code problem similarity detection using code clones and pretrained models

There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. Thes...

Full description

Bibliographic Details
Main Author: Yeo, Geremie Yun Siang
Other Authors: Anwitaman Datta
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165850
Description
Summary:There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another.