Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis

The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ...

Full description

Bibliographic Details
Main Authors: Quang-Vinh Dang, Guee-Sang Lee
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10172213/
Description
Summary:The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ paired data generation to secure sufficient data samples for text segmentation via Text Image-conditional GANs. Furthermore, existing models implicitly model text attributes such as size, layout, font, and structure, which hinders their performance. To remedy this, we propose a Multi-task Cascade Transformer network that explicitly learns these attributes using large volumes of generated synthetic data. The transformer-based network includes two auxiliary tasks and one main task for text segmentation. The auxiliary tasks help the network learn text regions to focus on, as well as the structure of the text through different words and fonts, to support the main task. To bridge the gap between different datasets, we train the proposed network on paired synthetic data before fine-tuning it on real data. Our experiments on publicly available scene text segmentation datasets show that our method outperforms existing methods.
ISSN:2169-3536