Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Rus...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-06-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/8/6/109 |
_version_ | 1797595313173692416 |
---|---|
author | Liliya A. Demidova Elena G. Andrianova Peter N. Sovietov Artyom V. Gorchakov |
author_facet | Liliya A. Demidova Elena G. Andrianova Peter N. Sovietov Artyom V. Gorchakov |
author_sort | Liliya A. Demidova |
collection | DOAJ |
description | This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students. |
first_indexed | 2024-03-11T02:35:38Z |
format | Article |
id | doaj.art-399abf7c98284367973e59f3ef44de56 |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-03-11T02:35:38Z |
publishDate | 2023-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-399abf7c98284367973e59f3ef44de562023-11-18T09:58:35ZengMDPI AGData2306-57292023-06-018610910.3390/data8060109Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching AssistantLiliya A. Demidova0Elena G. Andrianova1Peter N. Sovietov2Artyom V. Gorchakov3Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, RussiaThis paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.https://www.mdpi.com/2306-5729/8/6/109autogradingprogramming exercise generationpythononline educationprogram text analysissource code analysis |
spellingShingle | Liliya A. Demidova Elena G. Andrianova Peter N. Sovietov Artyom V. Gorchakov Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant Data autograding programming exercise generation python online education program text analysis source code analysis |
title | Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant |
title_full | Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant |
title_fullStr | Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant |
title_full_unstemmed | Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant |
title_short | Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant |
title_sort | dataset of program source codes solving unique programming exercises generated by digital teaching assistant |
topic | autograding programming exercise generation python online education program text analysis source code analysis |
url | https://www.mdpi.com/2306-5729/8/6/109 |
work_keys_str_mv | AT liliyaademidova datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant AT elenagandrianova datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant AT peternsovietov datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant AT artyomvgorchakov datasetofprogramsourcecodessolvinguniqueprogrammingexercisesgeneratedbydigitalteachingassistant |