Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and softwa...

Full description

Bibliographic Details
Main Authors:	Mohammad D. Alahmadi, Moayad Alshangiti
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Mathematics
Subjects:	OCR (optical character recognition) code extraction programming screencasts image quality pre-processing techniques postprocessing techniques
Online Access:	https://www.mdpi.com/2227-7390/12/7/1036

_version_	1797212297179955200
author	Mohammad D. Alahmadi Moayad Alshangiti
author_facet	Mohammad D. Alahmadi Moayad Alshangiti
author_sort	Mohammad D. Alahmadi
collection	DOAJ
description	The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.
first_indexed	2024-04-24T10:40:08Z
format	Article
id	doaj.art-69b4124292294109b03cb028f566757f
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-04-24T10:40:08Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-69b4124292294109b03cb028f566757f2024-04-12T13:22:41ZengMDPI AGMathematics2227-73902024-03-01127103610.3390/math12071036Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMohammad D. Alahmadi0Moayad Alshangiti1Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi ArabiaDepartment of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi ArabiaThe rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.https://www.mdpi.com/2227-7390/12/7/1036OCR (optical character recognition)code extractionprogramming screencastsimage qualitypre-processing techniquespostprocessing techniques
spellingShingle	Mohammad D. Alahmadi Moayad Alshangiti Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models Mathematics OCR (optical character recognition) code extraction programming screencasts image quality pre-processing techniques postprocessing techniques
title	Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
title_full	Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
title_fullStr	Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
title_full_unstemmed	Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
title_short	Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
title_sort	optimizing ocr performance for programming videos the role of image super resolution and large language models
topic	OCR (optical character recognition) code extraction programming screencasts image quality pre-processing techniques postprocessing techniques
url	https://www.mdpi.com/2227-7390/12/7/1036
work_keys_str_mv	AT mohammaddalahmadi optimizingocrperformanceforprogrammingvideostheroleofimagesuperresolutionandlargelanguagemodels AT moayadalshangiti optimizingocrperformanceforprogrammingvideostheroleofimagesuperresolutionandlargelanguagemodels

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Similar Items