FDTA: Fully Convolutional Scene Text Detection With Text Attention

Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore...

Full description

Bibliographic Details
Main Authors:	Yongcun Cao, Shuaisen Ma, Haichuan Pan
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Scene text detection full convolution network DR Loss convolutional neural network
Online Access:	https://ieeexplore.ieee.org/document/9174729/

_version_	1819172861377511424
author	Yongcun Cao Shuaisen Ma Haichuan Pan
author_facet	Yongcun Cao Shuaisen Ma Haichuan Pan
author_sort	Yongcun Cao
collection	DOAJ
description	Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.
first_indexed	2024-12-22T20:13:54Z
format	Article
id	doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-22T20:13:54Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae2022-12-21T18:14:00ZengIEEEIEEE Access2169-35362020-01-01815544115544910.1109/ACCESS.2020.30187849174729FDTA: Fully Convolutional Scene Text Detection With Text AttentionYongcun Cao0https://orcid.org/0000-0002-0125-4526Shuaisen Ma1https://orcid.org/0000-0002-9399-5806Haichuan Pan2School of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaText detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.https://ieeexplore.ieee.org/document/9174729/Scene text detectionfull convolution networkDR Lossconvolutional neural network
spellingShingle	Yongcun Cao Shuaisen Ma Haichuan Pan FDTA: Fully Convolutional Scene Text Detection With Text Attention IEEE Access Scene text detection full convolution network DR Loss convolutional neural network
title	FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_full	FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_fullStr	FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_full_unstemmed	FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_short	FDTA: Fully Convolutional Scene Text Detection With Text Attention
title_sort	fdta fully convolutional scene text detection with text attention
topic	Scene text detection full convolution network DR Loss convolutional neural network
url	https://ieeexplore.ieee.org/document/9174729/
work_keys_str_mv	AT yongcuncao fdtafullyconvolutionalscenetextdetectionwithtextattention AT shuaisenma fdtafullyconvolutionalscenetextdetectionwithtextattention AT haichuanpan fdtafullyconvolutionalscenetextdetectionwithtextattention

FDTA: Fully Convolutional Scene Text Detection With Text Attention

Similar Items