CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous...

Full description

Bibliographic Details
Main Authors: Yuan Li, Mayire Ibrayim, Askar Hamdulla
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/12/12/524
_version_ 1797503677419749376
author Yuan Li
Mayire Ibrayim
Askar Hamdulla
author_facet Yuan Li
Mayire Ibrayim
Askar Hamdulla
author_sort Yuan Li
collection DOAJ
description In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.
first_indexed 2024-03-10T03:54:08Z
format Article
id doaj.art-e02ab6e84d0e494791f4ebcd2f32a820
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T03:54:08Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-e02ab6e84d0e494791f4ebcd2f32a8202023-11-23T08:51:43ZengMDPI AGInformation2078-24892021-12-01121252410.3390/info12120524CSFF-Net: Scene Text Detection Based on Cross-Scale Feature FusionYuan Li0Mayire Ibrayim1Askar Hamdulla2College of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaIn the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.https://www.mdpi.com/2078-2489/12/12/524feature extractionattention mechanismpyramid networkdeep learningtext detection
spellingShingle Yuan Li
Mayire Ibrayim
Askar Hamdulla
CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
Information
feature extraction
attention mechanism
pyramid network
deep learning
text detection
title CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_full CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_fullStr CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_full_unstemmed CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_short CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_sort csff net scene text detection based on cross scale feature fusion
topic feature extraction
attention mechanism
pyramid network
deep learning
text detection
url https://www.mdpi.com/2078-2489/12/12/524
work_keys_str_mv AT yuanli csffnetscenetextdetectionbasedoncrossscalefeaturefusion
AT mayireibrayim csffnetscenetextdetectionbasedoncrossscalefeaturefusion
AT askarhamdulla csffnetscenetextdetectionbasedoncrossscalefeaturefusion