An end-to-end text spotter with text relation networks
Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing w...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2021-04-01
|
Series: | Cybersecurity |
Subjects: | |
Online Access: | https://doi.org/10.1186/s42400-021-00073-x |
_version_ | 1818646704256188416 |
---|---|
author | Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang |
author_facet | Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang |
author_sort | Jianguo Jiang |
collection | DOAJ |
description | Abstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task. |
first_indexed | 2024-12-17T00:50:52Z |
format | Article |
id | doaj.art-1852b9e82c194946a7ef9d407c25f79f |
institution | Directory Open Access Journal |
issn | 2523-3246 |
language | English |
last_indexed | 2024-12-17T00:50:52Z |
publishDate | 2021-04-01 |
publisher | SpringerOpen |
record_format | Article |
series | Cybersecurity |
spelling | doaj.art-1852b9e82c194946a7ef9d407c25f79f2022-12-21T22:09:46ZengSpringerOpenCybersecurity2523-32462021-04-014111310.1186/s42400-021-00073-xAn end-to-end text spotter with text relation networksJianguo Jiang0Baole Wei1Min Yu2Gang Li3Boquan Li4Chao Liu5Min Li6Weiqing Huang7Institute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesCentre for Cyber Security Research and Innovation, Deakin UniversityInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesInstitute of Information Engineering, Chinese Academy of SciencesAbstract Reading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the SCUT-CTW1500, our method surpasses the previous top works by 2.1% on the word spotting task.https://doi.org/10.1186/s42400-021-00073-xScene text spottingGraph convolutional networkVisual reasoning |
spellingShingle | Jianguo Jiang Baole Wei Min Yu Gang Li Boquan Li Chao Liu Min Li Weiqing Huang An end-to-end text spotter with text relation networks Cybersecurity Scene text spotting Graph convolutional network Visual reasoning |
title | An end-to-end text spotter with text relation networks |
title_full | An end-to-end text spotter with text relation networks |
title_fullStr | An end-to-end text spotter with text relation networks |
title_full_unstemmed | An end-to-end text spotter with text relation networks |
title_short | An end-to-end text spotter with text relation networks |
title_sort | end to end text spotter with text relation networks |
topic | Scene text spotting Graph convolutional network Visual reasoning |
url | https://doi.org/10.1186/s42400-021-00073-x |
work_keys_str_mv | AT jianguojiang anendtoendtextspotterwithtextrelationnetworks AT baolewei anendtoendtextspotterwithtextrelationnetworks AT minyu anendtoendtextspotterwithtextrelationnetworks AT gangli anendtoendtextspotterwithtextrelationnetworks AT boquanli anendtoendtextspotterwithtextrelationnetworks AT chaoliu anendtoendtextspotterwithtextrelationnetworks AT minli anendtoendtextspotterwithtextrelationnetworks AT weiqinghuang anendtoendtextspotterwithtextrelationnetworks AT jianguojiang endtoendtextspotterwithtextrelationnetworks AT baolewei endtoendtextspotterwithtextrelationnetworks AT minyu endtoendtextspotterwithtextrelationnetworks AT gangli endtoendtextspotterwithtextrelationnetworks AT boquanli endtoendtextspotterwithtextrelationnetworks AT chaoliu endtoendtextspotterwithtextrelationnetworks AT minli endtoendtextspotterwithtextrelationnetworks AT weiqinghuang endtoendtextspotterwithtextrelationnetworks |