A new visual signature for content-based indexing of low resolution documents

This paper proposes a new visual signature for content –based indexing of low resolution documents. Camera Based Document Analysis and Recognition (CBDAR) has been established which deals with the textual information in scene images taken by low cost hand held devices like digital camera, cell p...

Volledige beschrijving

Bibliografische gegevens
Hoofdauteurs: Md Nor, Danial, Abd. Wahab, M. Helmy, M. Jenu, M. Zarar, Ogier, Jean-Marc
Formaat: Artikel
Taal:English
Gepubliceerd in: 2012
Onderwerpen:
Online toegang:http://eprints.uthm.edu.my/7097/1/J14168_5130d0b6fdee9bb0e61a4edec1d3837d.pdf
_version_ 1825710313511583744
author Md Nor, Danial
Abd. Wahab, M. Helmy
M. Jenu, M. Zarar
Ogier, Jean-Marc
author_facet Md Nor, Danial
Abd. Wahab, M. Helmy
M. Jenu, M. Zarar
Ogier, Jean-Marc
author_sort Md Nor, Danial
collection UTHM
description This paper proposes a new visual signature for content –based indexing of low resolution documents. Camera Based Document Analysis and Recognition (CBDAR) has been established which deals with the textual information in scene images taken by low cost hand held devices like digital camera, cell phones, etc. A lot of applications like text translation, reading text for visually impaired and blind person, information retrieval from media document, e-learning, etc., can be built using the techniques developed in CBDAR domain. The proposed approach of extraction of textual information is composed of three steps: image segmentation, text localization and extraction, and Optical Character Recognition. First of all, for pre-processing the resolution of each image is checked for re-sampling to a common resolution format (720 X 540). Then, the final image is converted to grayscale and binarized using Otsu segmentation method for further processing. In addition, looking at the mean horizontal run length of both black and white pixels, the proper segmentation of foreground objects is checked. In the post-processing step, the text localizer validates the candidate text regions proposed by text detector. We have employed a connected component approach for text localization. The extracted text is then has been successfully recognized using ABBYY FineReader for OCR. Apart from OCR, we had created a novel feature vectors from textual information for Content-Based Image Retrieval (CBIR).
first_indexed 2024-03-05T21:55:43Z
format Article
id uthm.eprints-7097
institution Universiti Tun Hussein Onn Malaysia
language English
last_indexed 2024-03-05T21:55:43Z
publishDate 2012
record_format dspace
spelling uthm.eprints-70972022-06-08T02:05:28Z http://eprints.uthm.edu.my/7097/ A new visual signature for content-based indexing of low resolution documents Md Nor, Danial Abd. Wahab, M. Helmy M. Jenu, M. Zarar Ogier, Jean-Marc T Technology (General) This paper proposes a new visual signature for content –based indexing of low resolution documents. Camera Based Document Analysis and Recognition (CBDAR) has been established which deals with the textual information in scene images taken by low cost hand held devices like digital camera, cell phones, etc. A lot of applications like text translation, reading text for visually impaired and blind person, information retrieval from media document, e-learning, etc., can be built using the techniques developed in CBDAR domain. The proposed approach of extraction of textual information is composed of three steps: image segmentation, text localization and extraction, and Optical Character Recognition. First of all, for pre-processing the resolution of each image is checked for re-sampling to a common resolution format (720 X 540). Then, the final image is converted to grayscale and binarized using Otsu segmentation method for further processing. In addition, looking at the mean horizontal run length of both black and white pixels, the proper segmentation of foreground objects is checked. In the post-processing step, the text localizer validates the candidate text regions proposed by text detector. We have employed a connected component approach for text localization. The extracted text is then has been successfully recognized using ABBYY FineReader for OCR. Apart from OCR, we had created a novel feature vectors from textual information for Content-Based Image Retrieval (CBIR). 2012 Article PeerReviewed text en http://eprints.uthm.edu.my/7097/1/J14168_5130d0b6fdee9bb0e61a4edec1d3837d.pdf Md Nor, Danial and Abd. Wahab, M. Helmy and M. Jenu, M. Zarar and Ogier, Jean-Marc (2012) A new visual signature for content-based indexing of low resolution documents. Journal of Information Retrieval and Knowledge Management, 2. pp. 88-95.
spellingShingle T Technology (General)
Md Nor, Danial
Abd. Wahab, M. Helmy
M. Jenu, M. Zarar
Ogier, Jean-Marc
A new visual signature for content-based indexing of low resolution documents
title A new visual signature for content-based indexing of low resolution documents
title_full A new visual signature for content-based indexing of low resolution documents
title_fullStr A new visual signature for content-based indexing of low resolution documents
title_full_unstemmed A new visual signature for content-based indexing of low resolution documents
title_short A new visual signature for content-based indexing of low resolution documents
title_sort new visual signature for content based indexing of low resolution documents
topic T Technology (General)
url http://eprints.uthm.edu.my/7097/1/J14168_5130d0b6fdee9bb0e61a4edec1d3837d.pdf
work_keys_str_mv AT mdnordanial anewvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT abdwahabmhelmy anewvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT mjenumzarar anewvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT ogierjeanmarc anewvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT mdnordanial newvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT abdwahabmhelmy newvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT mjenumzarar newvisualsignatureforcontentbasedindexingoflowresolutiondocuments
AT ogierjeanmarc newvisualsignatureforcontentbasedindexingoflowresolutiondocuments