Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Rus...

Full description

Bibliographic Details
Main Authors: Liliya A. Demidova, Elena G. Andrianova, Peter N. Sovietov, Artyom V. Gorchakov
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/8/6/109
_version_ 1797595313173692416
author Liliya A. Demidova
Elena G. Andrianova
Peter N. Sovietov
Artyom V. Gorchakov
author_facet Liliya A. Demidova
Elena G. Andrianova
Peter N. Sovietov
Artyom V. Gorchakov
author_sort Liliya A. Demidova
collection DOAJ
description This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.
first_indexed 2024-03-11T02:35:38Z
format Article
id doaj.art-399abf7c98284367973e59f3ef44de56
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-11T02:35:38Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-399abf7c98284367973e59f3ef44de562023-11-18T09:58:35ZengMDPI AGData2306-57292023-06-018610910.3390/data8060109Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching AssistantLiliya A. Demidova0Elena G. Andrianova1Peter N. Sovietov2Artyom V. Gorchakov3Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaThis paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.https://www.mdpi.com/2306-5729/8/6/109autogradingprogramming exercise generationpythononline educationprogram text analysissource code analysis
spellingShingle Liliya A. Demidova
Elena G. Andrianova
Peter N. Sovietov
Artyom V. Gorchakov
Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
Data
autograding
programming exercise generation
python
online education
program text analysis
source code analysis
title Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
title_full Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
title_fullStr Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
title_full_unstemmed Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
title_short Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
title_sort dataset of program source codes solving unique programming exercises generated by digital teaching assistant
topic autograding
programming exercise generation
python
online education
program text analysis
source code analysis
url https://www.mdpi.com/2306-5729/8/6/109
work_keys_str_mv AT liliyaademidova datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant
AT elenagandrianova datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant
AT peternsovietov datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant
AT artyomvgorchakov datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant