Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images

In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider som...

Full description

Bibliographic Details
Main Authors:	Sankalp Sinha, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Format:	Article
Language:	English
Published:	MDPI AG 2022-10-01
Series:	Applied Sciences
Subjects:	graphical page object detection deep learning computer vision proposals document image analysis
Online Access:	https://www.mdpi.com/2076-3417/12/20/10578

_version_	1827651826730663936
author	Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal
author_facet	Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal
author_sort	Sankalp Sinha
collection	DOAJ
description	In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.
first_indexed	2024-03-09T20:45:59Z
format	Article
id	doaj.art-be4f88fa8a5841518cdb3ef2e5e1e063
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T20:45:59Z
publishDate	2022-10-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-be4f88fa8a5841518cdb3ef2e5e1e0632023-11-23T22:46:57ZengMDPI AGApplied Sciences2076-34172022-10-0112201057810.3390/app122010578Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document ImagesSankalp Sinha0Khurram Azeem Hashmi1Alain Pagani2Marcus Liwicki3Didier Stricker4Muhammad Zeshan Afzal5Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyGerman Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, GermanyDepartment of Computer Science, Luleå University of Technology, 971 87 Luleå, SwedenDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyIn the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.https://www.mdpi.com/2076-3417/12/20/10578graphical page object detectiondeep learningcomputer visionproposalsdocument image analysis
spellingShingle	Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images Applied Sciences graphical page object detection deep learning computer vision proposals document image analysis
title	Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_full	Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_fullStr	Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_full_unstemmed	Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_short	Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
title_sort	rethinking learnable proposals for graphical object detection in scanned document images
topic	graphical page object detection deep learning computer vision proposals document image analysis
url	https://www.mdpi.com/2076-3417/12/20/10578
work_keys_str_mv	AT sankalpsinha rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT khurramazeemhashmi rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT alainpagani rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT marcusliwicki rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT didierstricker rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT muhammadzeshanafzal rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages

Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images

Similar Items