Toward Semi-Supervised Graphical Object Detection in Document Images

The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in doc...

Full description

Bibliographic Details
Main Authors: Goutham Kallempudi, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/14/6/176
Description
Summary:The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>1</mn><mo>%</mo><mo>,</mo><mo> </mo><mn>5</mn><mo>%</mo></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10</mn><mo>%</mo><mo>)</mo></mrow></semantics></math></inline-formula>. Furthermore, the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10</mn><mo>%</mo></mrow></semantics></math></inline-formula> PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>+</mo><mn>5.4</mn><mo>,</mo><mo>+</mo><mn>1.2</mn></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>+</mo><mn>3.2</mn></mrow></semantics></math></inline-formula> points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10</mn><mo>%</mo></mrow></semantics></math></inline-formula> of IIIT-AR-13K labeled data beats the previous fully supervised method <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>+</mo><mn>4.5</mn></mrow></semantics></math></inline-formula> points.
ISSN:1999-5903