Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider som...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/20/10578 |
_version_ | 1827651826730663936 |
---|---|
author | Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal |
author_facet | Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal |
author_sort | Sankalp Sinha |
collection | DOAJ |
description | In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection. |
first_indexed | 2024-03-09T20:45:59Z |
format | Article |
id | doaj.art-be4f88fa8a5841518cdb3ef2e5e1e063 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T20:45:59Z |
publishDate | 2022-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-be4f88fa8a5841518cdb3ef2e5e1e0632023-11-23T22:46:57ZengMDPI AGApplied Sciences2076-34172022-10-0112201057810.3390/app122010578Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document ImagesSankalp Sinha0Khurram Azeem Hashmi1Alain Pagani2Marcus Liwicki3Didier Stricker4Muhammad Zeshan Afzal5Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyGerman Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, GermanyDepartment of Computer Science, Luleå University of Technology, 971 87 Luleå, SwedenDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyDepartment of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, GermanyIn the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.https://www.mdpi.com/2076-3417/12/20/10578graphical page object detectiondeep learningcomputer visionproposalsdocument image analysis |
spellingShingle | Sankalp Sinha Khurram Azeem Hashmi Alain Pagani Marcus Liwicki Didier Stricker Muhammad Zeshan Afzal Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images Applied Sciences graphical page object detection deep learning computer vision proposals document image analysis |
title | Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images |
title_full | Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images |
title_fullStr | Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images |
title_full_unstemmed | Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images |
title_short | Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images |
title_sort | rethinking learnable proposals for graphical object detection in scanned document images |
topic | graphical page object detection deep learning computer vision proposals document image analysis |
url | https://www.mdpi.com/2076-3417/12/20/10578 |
work_keys_str_mv | AT sankalpsinha rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT khurramazeemhashmi rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT alainpagani rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT marcusliwicki rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT didierstricker rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages AT muhammadzeshanafzal rethinkinglearnableproposalsforgraphicalobjectdetectioninscanneddocumentimages |