Fast text image detection

Internet offers a broad platform for people to share information and opinions. Illegal or sensitive commentaries in written form are blocked easily by text filters. However, it is difficult to automatically filter out those articles embedded and propagated via images. Among the large number of image...

Full description

Bibliographic Details
Main Author: Devadeep Shyam
Other Authors: Kot Chichung, Alex
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/65349
Description
Summary:Internet offers a broad platform for people to share information and opinions. Illegal or sensitive commentaries in written form are blocked easily by text filters. However, it is difficult to automatically filter out those articles embedded and propagated via images. Among the large number of images, in order to prohibit the dissemination of those commentaries, detecting whether an image contains a sufficient amount of words provides convenience to the government. In this thesis, we propose a detection system to determine whether an image contains paragraphs or not. First of all, we propose a Histogram based method to filter out the images having text paragraphs in horizontal orientation and then propose a method based on Hough Transformation to detect text paragraphs in arbitrary orientation from the images without paragraphs. To achieve a better performance and detect text images with text of arbitrary orientation on images, we propose the detection system by combining the two proposed methods. To imitate the scenario, we construct a new dataset covering more than 2000 images of with and without paragraphs. Extensive experiments on the dataset demonstrate the effectiveness and practicability of the proposed detection system.