An end-to-end text spotter with text relation networks

Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing w...

Full description

Bibliographic Details
Main Authors: Jianguo Jiang, Baole Wei, Min Yu, Gang Li, Boquan Li, Chao Liu, Min Li, Weiqing Huang
Format: Article
Language:English
Published: SpringerOpen 2021-04-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-021-00073-x
_version_ 1818646704256188416
author Jianguo Jiang
Baole Wei
Min Yu
Gang Li
Boquan Li
Chao Liu
Min Li
Weiqing Huang
author_facet Jianguo Jiang
Baole Wei
Min Yu
Gang Li
Boquan Li
Chao Liu
Min Li
Weiqing Huang
author_sort Jianguo Jiang
collection DOAJ
description Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task.
first_indexed 2024-12-17T00:50:52Z
format Article
id doaj.art-1852b9e82c194946a7ef9d407c25f79f
institution Directory Open Access Journal
issn 2523-3246
language English
last_indexed 2024-12-17T00:50:52Z
publishDate 2021-04-01
publisher SpringerOpen
record_format Article
series Cybersecurity
spelling doaj.art-1852b9e82c194946a7ef9d407c25f79f2022-12-21T22:09:46ZengSpringerOpenCybersecurity2523-32462021-04-014111310.1186/s42400-021-00073-xAn end-to-end text spotter with text relation networksJianguo Jiang0Baole Wei1Min Yu2Gang Li3Boquan Li4Chao Liu5Min Li6Weiqing Huang7Institute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesCentre for Cyber Security Research and Innovation, Deakin UniversityInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesAbstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task.https://doi.org/10.1186/s42400-021-00073-xScene text spottingGraph convolutional networkVisual reasoning
spellingShingle Jianguo Jiang
Baole Wei
Min Yu
Gang Li
Boquan Li
Chao Liu
Min Li
Weiqing Huang
An end-to-end text spotter with text relation networks
Cybersecurity
Scene text spotting
Graph convolutional network
Visual reasoning
title An end-to-end text spotter with text relation networks
title_full An end-to-end text spotter with text relation networks
title_fullStr An end-to-end text spotter with text relation networks
title_full_unstemmed An end-to-end text spotter with text relation networks
title_short An end-to-end text spotter with text relation networks
title_sort end to end text spotter with text relation networks
topic Scene text spotting
Graph convolutional network
Visual reasoning
url https://doi.org/10.1186/s42400-021-00073-x
work_keys_str_mv AT jianguojiang anendtoendtextspotterwithtextrelationnetworks
AT baolewei anendtoendtextspotterwithtextrelationnetworks
AT minyu anendtoendtextspotterwithtextrelationnetworks
AT gangli anendtoendtextspotterwithtextrelationnetworks
AT boquanli anendtoendtextspotterwithtextrelationnetworks
AT chaoliu anendtoendtextspotterwithtextrelationnetworks
AT minli anendtoendtextspotterwithtextrelationnetworks
AT weiqinghuang anendtoendtextspotterwithtextrelationnetworks
AT jianguojiang endtoendtextspotterwithtextrelationnetworks
AT baolewei endtoendtextspotterwithtextrelationnetworks
AT minyu endtoendtextspotterwithtextrelationnetworks
AT gangli endtoendtextspotterwithtextrelationnetworks
AT boquanli endtoendtextspotterwithtextrelationnetworks
AT chaoliu endtoendtextspotterwithtextrelationnetworks
AT minli endtoendtextspotterwithtextrelationnetworks
AT weiqinghuang endtoendtextspotterwithtextrelationnetworks