FDTA: Fully Convolutional Scene Text Detection With Text Attention
Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9174729/ |
_version_ | 1819172861377511424 |
---|---|
author | Yongcun Cao Shuaisen Ma Haichuan Pan |
author_facet | Yongcun Cao Shuaisen Ma Haichuan Pan |
author_sort | Yongcun Cao |
collection | DOAJ |
description | Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution. |
first_indexed | 2024-12-22T20:13:54Z |
format | Article |
id | doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T20:13:54Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-2f0687dbd1194cbcbbc16648c0ece9ae2022-12-21T18:14:00ZengIEEEIEEE Access2169-35362020-01-01815544115544910.1109/ACCESS.2020.30187849174729FDTA: Fully Convolutional Scene Text Detection With Text AttentionYongcun Cao0https://orcid.org/0000-0002-0125-4526Shuaisen Ma1https://orcid.org/0000-0002-9399-5806Haichuan Pan2School of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaSchool of Information Engineering, Minzu University of China, Beijing, ChinaText detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.https://ieeexplore.ieee.org/document/9174729/Scene text detectionfull convolution networkDR Lossconvolutional neural network |
spellingShingle | Yongcun Cao Shuaisen Ma Haichuan Pan FDTA: Fully Convolutional Scene Text Detection With Text Attention IEEE Access Scene text detection full convolution network DR Loss convolutional neural network |
title | FDTA: Fully Convolutional Scene Text Detection With Text Attention |
title_full | FDTA: Fully Convolutional Scene Text Detection With Text Attention |
title_fullStr | FDTA: Fully Convolutional Scene Text Detection With Text Attention |
title_full_unstemmed | FDTA: Fully Convolutional Scene Text Detection With Text Attention |
title_short | FDTA: Fully Convolutional Scene Text Detection With Text Attention |
title_sort | fdta fully convolutional scene text detection with text attention |
topic | Scene text detection full convolution network DR Loss convolutional neural network |
url | https://ieeexplore.ieee.org/document/9174729/ |
work_keys_str_mv | AT yongcuncao fdtafullyconvolutionalscenetextdetectionwithtextattention AT shuaisenma fdtafullyconvolutionalscenetextdetectionwithtextattention AT haichuanpan fdtafullyconvolutionalscenetextdetectionwithtextattention |