Synthetic data for text localisation in natural images
In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for...
Main Authors: | , , |
---|---|
Format: | Internet publication |
Language: | English |
Published: |
2016
|
_version_ | 1826316118298460160 |
---|---|
author | Gupta, A Vedaldi, A Zisserman, A |
author_facet | Gupta, A Vedaldi, A Zisserman, A |
author_sort | Gupta, A |
collection | OXFORD |
description | In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU. |
first_indexed | 2024-12-09T03:39:56Z |
format | Internet publication |
id | oxford-uuid:ec71641e-646c-4921-a315-f5cf58cdf4ad |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:39:56Z |
publishDate | 2016 |
record_format | dspace |
spelling | oxford-uuid:ec71641e-646c-4921-a315-f5cf58cdf4ad2024-12-05T16:03:14ZSynthetic data for text localisation in natural imagesInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:ec71641e-646c-4921-a315-f5cf58cdf4adEnglishSymplectic Elements2016Gupta, AVedaldi, AZisserman, AIn this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU. |
spellingShingle | Gupta, A Vedaldi, A Zisserman, A Synthetic data for text localisation in natural images |
title | Synthetic data for text localisation in natural images |
title_full | Synthetic data for text localisation in natural images |
title_fullStr | Synthetic data for text localisation in natural images |
title_full_unstemmed | Synthetic data for text localisation in natural images |
title_short | Synthetic data for text localisation in natural images |
title_sort | synthetic data for text localisation in natural images |
work_keys_str_mv | AT guptaa syntheticdatafortextlocalisationinnaturalimages AT vedaldia syntheticdatafortextlocalisationinnaturalimages AT zissermana syntheticdatafortextlocalisationinnaturalimages |