STELA: A Real-Time Scene Text Detector With Learned Anchor

To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection wher...

Full description

Bibliographic Details
Main Authors: Linjie Deng, Yanxiang Gong, Xinchen Lu, Yi Lin, Zheng Ma, Mei Xie
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8877714/
_version_ 1818349432054218752
author Linjie Deng
Yanxiang Gong
Xinchen Lu
Yi Lin
Zheng Ma
Mei Xie
author_facet Linjie Deng
Yanxiang Gong
Xinchen Lu
Yi Lin
Zheng Ma
Mei Xie
author_sort Linjie Deng
collection DOAJ
description To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the two-stage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a one-stage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally realtime efficiency (26.5fps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code is publicly available at https://github.com/xhzdeng/stela.
first_indexed 2024-12-13T18:05:51Z
format Article
id doaj.art-e5b9626a63904fb38437ca428b4488a4
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-13T18:05:51Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e5b9626a63904fb38437ca428b4488a42022-12-21T23:36:04ZengIEEEIEEE Access2169-35362019-01-01715340015340710.1109/ACCESS.2019.29484058877714STELA: A Real-Time Scene Text Detector With Learned AnchorLinjie Deng0https://orcid.org/0000-0002-4921-8639Yanxiang Gong1Xinchen Lu2Yi Lin3https://orcid.org/0000-0002-7194-5023Zheng Ma4Mei Xie5School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaNational Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaTo achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the two-stage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a one-stage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally realtime efficiency (26.5fps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code is publicly available at https://github.com/xhzdeng/stela.https://ieeexplore.ieee.org/document/8877714/Scene text detectionreal-time detectorlearned anchor
spellingShingle Linjie Deng
Yanxiang Gong
Xinchen Lu
Yi Lin
Zheng Ma
Mei Xie
STELA: A Real-Time Scene Text Detector With Learned Anchor
IEEE Access
Scene text detection
real-time detector
learned anchor
title STELA: A Real-Time Scene Text Detector With Learned Anchor
title_full STELA: A Real-Time Scene Text Detector With Learned Anchor
title_fullStr STELA: A Real-Time Scene Text Detector With Learned Anchor
title_full_unstemmed STELA: A Real-Time Scene Text Detector With Learned Anchor
title_short STELA: A Real-Time Scene Text Detector With Learned Anchor
title_sort stela a real time scene text detector with learned anchor
topic Scene text detection
real-time detector
learned anchor
url https://ieeexplore.ieee.org/document/8877714/
work_keys_str_mv AT linjiedeng stelaarealtimescenetextdetectorwithlearnedanchor
AT yanxianggong stelaarealtimescenetextdetectorwithlearnedanchor
AT xinchenlu stelaarealtimescenetextdetectorwithlearnedanchor
AT yilin stelaarealtimescenetextdetectorwithlearnedanchor
AT zhengma stelaarealtimescenetextdetectorwithlearnedanchor
AT meixie stelaarealtimescenetextdetectorwithlearnedanchor