FDTA: Fully Convolutional Scene Text Detection With Text Attention

Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore...

Full description

Bibliographic Details
Main Authors: Yongcun Cao, Shuaisen Ma, Haichuan Pan
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9174729/
_version_ 1819172861377511424
author Yongcun Cao
Shuaisen Ma
Haichuan Pan
author_facet Yongcun Cao
Shuaisen Ma
Haichuan Pan
author_sort Yongcun Cao
collection DOAJ
description Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.
first_indexed 2024-12-22T20:13:54Z
format Article
id doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T20:13:54Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae2022-12-21T18:14:00ZengIEEEIEEE Access2169-35362020-01-01815544115544910.1109/ACCESS.2020.30187849174729FDTA: Fully Convolutional Scene Text Detection With Text AttentionYongcun Cao0https://orcid.org/0000-0002-0125-4526Shuaisen Ma1https://orcid.org/0000-0002-9399-5806Haichuan Pan2School of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaText detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.https://ieeexplore.ieee.org/document/9174729/Scene text detectionfull convolution networkDR Lossconvolutional neural network
spellingShingle Yongcun Cao
Shuaisen Ma
Haichuan Pan
FDTA: Fully Convolutional Scene Text Detection With Text Attention
IEEE Access
Scene text detection
full convolution network
DR Loss
convolutional neural network
title FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_full FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_fullStr FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_full_unstemmed FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_short FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_sort fdta fully convolutional scene text detection with text attention
topic Scene text detection
full convolution network
DR Loss
convolutional neural network
url https://ieeexplore.ieee.org/document/9174729/
work_keys_str_mv AT yongcuncao fdtafullyconvolutionalscenetextdetectionwithtextattention
AT shuaisenma fdtafullyconvolutionalscenetextdetectionwithtextattention
AT haichuanpan fdtafullyconvolutionalscenetextdetectionwithtextattention