An end-to-end text spotter with text relation networks

Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing w...

Full description

Bibliographic Details
Main Authors:	Jianguo Jiang, Baole Wei, Min Yu, Gang Li, Boquan Li, Chao Liu, Min Li, Weiqing Huang
Format:	Article
Language:	English
Published:	SpringerOpen 2021-04-01
Series:	Cybersecurity
Subjects:	Scene text spotting Graph convolutional network Visual reasoning
Online Access:	https://doi.org/10.1186/s42400-021-00073-x

_version_	1818646704256188416
author	Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang
author_facet	Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang
author_sort	Jianguo Jiang
collection	DOAJ
description	Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task.
first_indexed	2024-12-17T00:50:52Z
format	Article
id	doaj.art-1852b9e82c194946a7ef9d407c25f79f
institution	Directory Open Access Journal
issn	2523-3246
language	English
last_indexed	2024-12-17T00:50:52Z
publishDate	2021-04-01
publisher	SpringerOpen
record_format	Article
series	Cybersecurity
spelling	doaj.art-1852b9e82c194946a7ef9d407c25f79f2022-12-21T22:09:46ZengSpringerOpenCybersecurity2523-32462021-04-014111310.1186/s42400-021-00073-xAn end-to-end text spotter with text relation networksJianguo Jiang0Baole Wei1Min Yu2Gang Li3Boquan Li4Chao Liu5Min Li6Weiqing Huang7Institute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesCentre for Cyber Security Research and Innovation, Deakin UniversityInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesAbstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task.https://doi.org/10.1186/s42400-021-00073-xScene text spottingGraph convolutional networkVisual reasoning
spellingShingle	Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang An end-to-end text spotter with text relation networks Cybersecurity Scene text spotting Graph convolutional network Visual reasoning
title	An end-to-end text spotter with text relation networks
title_full	An end-to-end text spotter with text relation networks
title_fullStr	An end-to-end text spotter with text relation networks
title_full_unstemmed	An end-to-end text spotter with text relation networks
title_short	An end-to-end text spotter with text relation networks
title_sort	end to end text spotter with text relation networks
topic	Scene text spotting Graph convolutional network Visual reasoning
url	https://doi.org/10.1186/s42400-021-00073-x
work_keys_str_mv	AT jianguojiang anendtoendtextspotterwithtextrelationnetworks AT baolewei anendtoendtextspotterwithtextrelationnetworks AT minyu anendtoendtextspotterwithtextrelationnetworks AT gangli anendtoendtextspotterwithtextrelationnetworks AT boquanli anendtoendtextspotterwithtextrelationnetworks AT chaoliu anendtoendtextspotterwithtextrelationnetworks AT minli anendtoendtextspotterwithtextrelationnetworks AT weiqinghuang anendtoendtextspotterwithtextrelationnetworks AT jianguojiang endtoendtextspotterwithtextrelationnetworks AT baolewei endtoendtextspotterwithtextrelationnetworks AT minyu endtoendtextspotterwithtextrelationnetworks AT gangli endtoendtextspotterwithtextrelationnetworks AT boquanli endtoendtextspotterwithtextrelationnetworks AT chaoliu endtoendtextspotterwithtextrelationnetworks AT minli endtoendtextspotterwithtextrelationnetworks AT weiqinghuang endtoendtextspotterwithtextrelationnetworks

An end-to-end text spotter with text relation networks

Similar Items