Investigating Models for the Transcription of Mathematical Formulas in Images

The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formul...

Full description

Bibliographic Details
Main Authors: Christian Feichter, Tim Schlippe
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/3/1140
Description
Summary:The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, <i>Formula2Text</i>, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model <i>LaTeX-OCR</i> and the NLP model <i>BART-Base</i>, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, <i>BART-Base</i>, <i>T5-Base</i>, and <i>FLAN-T5-Base</i> even outperformed <i>ChatGPT</i>, <i>GPT-3.5 Turbo</i>, and <i>GPT-4</i>. For (2), the best VL model, <i>TrOCR</i>, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.
ISSN:2076-3417