CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous...

Full description

Bibliographic Details
Main Authors:	Yuan Li, Mayire Ibrayim, Askar Hamdulla
Format:	Article
Language:	English
Published:	MDPI AG 2021-12-01
Series:	Information
Subjects:	feature extraction attention mechanism pyramid network deep learning text detection
Online Access:	https://www.mdpi.com/2078-2489/12/12/524

_version_	1797503677419749376
author	Yuan Li Mayire Ibrayim Askar Hamdulla
author_facet	Yuan Li Mayire Ibrayim Askar Hamdulla
author_sort	Yuan Li
collection	DOAJ
description	In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.
first_indexed	2024-03-10T03:54:08Z
format	Article
id	doaj.art-e02ab6e84d0e494791f4ebcd2f32a820
institution	Directory Open Access Journal
issn	2078-2489
language	English
last_indexed	2024-03-10T03:54:08Z
publishDate	2021-12-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj.art-e02ab6e84d0e494791f4ebcd2f32a8202023-11-23T08:51:43ZengMDPI AGInformation2078-24892021-12-01121252410.3390/info12120524CSFF-Net: Scene Text Detection Based on Cross-Scale Feature FusionYuan Li0Mayire Ibrayim1Askar Hamdulla2College of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaIn the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.https://www.mdpi.com/2078-2489/12/12/524feature extractionattention mechanismpyramid networkdeep learningtext detection
spellingShingle	Yuan Li Mayire Ibrayim Askar Hamdulla CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion Information feature extraction attention mechanism pyramid network deep learning text detection
title	CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_full	CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_fullStr	CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_full_unstemmed	CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_short	CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion
title_sort	csff net scene text detection based on cross scale feature fusion
topic	feature extraction attention mechanism pyramid network deep learning text detection
url	https://www.mdpi.com/2078-2489/12/12/524
work_keys_str_mv	AT yuanli csffnetscenetextdetectionbasedoncrossscalefeaturefusion AT mayireibrayim csffnetscenetextdetectionbasedoncrossscalefeaturefusion AT askarhamdulla csffnetscenetextdetectionbasedoncrossscalefeaturefusion

CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

Similar Items