Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor

Reading text and unified text detection and recognition from natural images are the most challenging applications in computer vision and document analysis. Previously proposed end-to-end scene text reading methods do not consider the frequency of input images at feature extraction, which slows down...

Full description

Bibliographic Details
Main Authors:	Direselign Addis Tadesse, Chuan-Ming Liu, Van-Dai Ta
Format:	Article
Language:	English
Published:	MDPI AG 2020-06-01
Series:	Applied Sciences
Subjects:	octave convolution bilingual scene text reading Ethiopic script attention
Online Access:	https://www.mdpi.com/2076-3417/10/13/4474

_version_	1797563915596464128
author	Direselign Addis Tadesse Chuan-Ming Liu Van-Dai Ta
author_facet	Direselign Addis Tadesse Chuan-Ming Liu Van-Dai Ta
author_sort	Direselign Addis Tadesse
collection	DOAJ
description	Reading text and unified text detection and recognition from natural images are the most challenging applications in computer vision and document analysis. Previously proposed end-to-end scene text reading methods do not consider the frequency of input images at feature extraction, which slows down the system, requires more memory, and recognizes text inaccurately. In this paper, we proposed an octave convolution (OctConv) feature extractor and a time-restricted attention encoder-decoder module for end-to-end scene text reading. The OctConv can extract features by factorizing the input image based on their frequency. It is a direct replacement of convolutions, orthogonal and complementary, for reducing redundancies and helps to boost the reading text through low memory requirements at a faster speed. In the text reading process, features are first extracted from the input image using Feature Pyramid Network (FPN) with OctConv Residual Network with depth 50 (ResNet50). Then, a Region Proposal Network (RPN) is applied to predict the location of the text area by using extracted features. Finally, a time-restricted attention encoder-decoder module is applied after the Region of Interest (RoI) pooling is performed. A bilingual real and synthetic scene text dataset is prepared for training and testing the proposed model. Additionally, well-known datasets including ICDAR2013, ICDAR2015, and Total Text are used for fine-tuning and evaluating its performance with previously proposed state-of-the-art methods. The proposed model shows promising results on both regular and irregular or curved text detection and reading tasks.
first_indexed	2024-03-10T18:49:58Z
format	Article
id	doaj.art-02b43a26325749419f505555ed99b838
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T18:49:58Z
publishDate	2020-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-02b43a26325749419f505555ed99b8382023-11-20T05:11:50ZengMDPI AGApplied Sciences2076-34172020-06-011013447410.3390/app10134474Unconstrained Bilingual Scene Text Reading Using Octave as a Feature ExtractorDireselign Addis Tadesse0Chuan-Ming Liu1Van-Dai Ta2Department of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei 106, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei 106, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei 106, TaiwanReading text and unified text detection and recognition from natural images are the most challenging applications in computer vision and document analysis. Previously proposed end-to-end scene text reading methods do not consider the frequency of input images at feature extraction, which slows down the system, requires more memory, and recognizes text inaccurately. In this paper, we proposed an octave convolution (OctConv) feature extractor and a time-restricted attention encoder-decoder module for end-to-end scene text reading. The OctConv can extract features by factorizing the input image based on their frequency. It is a direct replacement of convolutions, orthogonal and complementary, for reducing redundancies and helps to boost the reading text through low memory requirements at a faster speed. In the text reading process, features are first extracted from the input image using Feature Pyramid Network (FPN) with OctConv Residual Network with depth 50 (ResNet50). Then, a Region Proposal Network (RPN) is applied to predict the location of the text area by using extracted features. Finally, a time-restricted attention encoder-decoder module is applied after the Region of Interest (RoI) pooling is performed. A bilingual real and synthetic scene text dataset is prepared for training and testing the proposed model. Additionally, well-known datasets including ICDAR2013, ICDAR2015, and Total Text are used for fine-tuning and evaluating its performance with previously proposed state-of-the-art methods. The proposed model shows promising results on both regular and irregular or curved text detection and reading tasks.https://www.mdpi.com/2076-3417/10/13/4474octave convolutionbilingual scene text readingEthiopic scriptattention
spellingShingle	Direselign Addis Tadesse Chuan-Ming Liu Van-Dai Ta Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor Applied Sciences octave convolution bilingual scene text reading Ethiopic script attention
title	Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor
title_full	Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor
title_fullStr	Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor
title_full_unstemmed	Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor
title_short	Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor
title_sort	unconstrained bilingual scene text reading using octave as a feature extractor
topic	octave convolution bilingual scene text reading Ethiopic script attention
url	https://www.mdpi.com/2076-3417/10/13/4474
work_keys_str_mv	AT direselignaddistadesse unconstrainedbilingualscenetextreadingusingoctaveasafeatureextractor AT chuanmingliu unconstrainedbilingualscenetextreadingusingoctaveasafeatureextractor AT vandaita unconstrainedbilingualscenetextreadingusingoctaveasafeatureextractor

Unconstrained Bilingual Scene Text Reading Using Octave as a Feature Extractor

Similar Items