Code problem similarity detection using code clones and pretrained models

There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. Thes...

Full description

Bibliographic Details
Main Author: Yeo, Geremie Yun Siang
Other Authors: Anwitaman Datta
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165850
_version_ 1811681705307668480
author Yeo, Geremie Yun Siang
author2 Anwitaman Datta
author_facet Anwitaman Datta
Yeo, Geremie Yun Siang
author_sort Yeo, Geremie Yun Siang
collection NTU
description There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another.
first_indexed 2024-10-01T03:45:11Z
format Final Year Project (FYP)
id ntu-10356/165850
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:45:11Z
publishDate 2023
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1658502024-09-24T07:30:17Z Code problem similarity detection using code clones and pretrained models Yeo, Geremie Yun Siang Anwitaman Datta Patrick Pun Chi Seng School of Computer Science and Engineering Anwitaman@ntu.edu.sg, cspun@ntu.edu.sg Engineering::Computer science and engineering There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another. Bachelor of Science in Mathematical and Computer Sciences 2023-04-14T01:13:14Z 2023-04-14T01:13:14Z 2023 Final Year Project (FYP) Yeo, G. Y. S. (2023). Code problem similarity detection using code clones and pretrained models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165850 https://hdl.handle.net/10356/165850 en SCSE22-0384 10.21979/N9/VPCR7H application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering
Yeo, Geremie Yun Siang
Code problem similarity detection using code clones and pretrained models
title Code problem similarity detection using code clones and pretrained models
title_full Code problem similarity detection using code clones and pretrained models
title_fullStr Code problem similarity detection using code clones and pretrained models
title_full_unstemmed Code problem similarity detection using code clones and pretrained models
title_short Code problem similarity detection using code clones and pretrained models
title_sort code problem similarity detection using code clones and pretrained models
topic Engineering::Computer science and engineering
url https://hdl.handle.net/10356/165850
work_keys_str_mv AT yeogeremieyunsiang codeproblemsimilaritydetectionusingcodeclonesandpretrainedmodels