Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images

In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider som...

Full description

Bibliographic Details
Main Authors: Sankalp Sinha, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/20/10578
_version_ 1827651826730663936
author Sankalp Sinha
Khurram Azeem Hashmi
Alain Pagani
Marcus Liwicki
Didier Stricker
Muhammad Zeshan Afzal
author_facet Sankalp Sinha
Khurram Azeem Hashmi
Alain Pagani
Marcus Liwicki
Didier Stricker
Muhammad Zeshan Afzal
author_sort Sankalp Sinha
collection DOAJ
description In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.
first_indexed 2024-03-09T20:45:59Z
format Article
id doaj.art-be4f88fa8a5841518cdb3ef2e5e1e063
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T20:45:59Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-be4f88fa8a5841518cdb3ef2e5e1e0632023-11-23T22:46:57ZengMDPI AGApplied Sciences2076-34172022-10-0112201057810.3390/app122010578Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document ImagesSankalp Sinha0Khurram Azeem Hashmi1Alain Pagani2Marcus Liwicki3Didier Stricker4Muhammad Zeshan Afzal5Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyGerman Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, GermanyDepartment of Computer Science, Luleå University of Technology, 971 87 Luleå, SwedenDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyIn the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.https://www.mdpi.com/2076-3417/12/20/10578graphical page object detectiondeep learningcomputer visionproposalsdocument image analysis
spellingShingle Sankalp Sinha
Khurram Azeem Hashmi
Alain Pagani
Marcus Liwicki
Didier Stricker
Muhammad Zeshan Afzal
Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
Applied Sciences
graphical page object detection
deep learning
computer vision
proposals
document image analysis
title Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_full Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_fullStr Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_full_unstemmed Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_short Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_sort rethinking learnable proposals for graphical object detection in scanned document images
topic graphical page object detection
deep learning
computer vision
proposals
document image analysis
url https://www.mdpi.com/2076-3417/12/20/10578
work_keys_str_mv AT sankalpsinha rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages
AT khurramazeemhashmi rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages
AT alainpagani rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages
AT marcusliwicki rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages
AT didierstricker rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages
AT muhammadzeshanafzal rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages