Efficient visual search of videos cast as text retrieval

We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in vi...

Celý popis

Podrobná bibliografie
Hlavní autoři:	Sivic, J, Zisserman, A
Médium:	Journal article
Jazyk:	English
Vydáno:	IEEE 2008

_version_	1826317152291913728
author	Sivic, J Zisserman, A
author_facet	Sivic, J Zisserman, A
author_sort	Sivic, J
collection	OXFORD
description	We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google. We report results for object retrieval on the full length feature films 'Groundhog Day', 'Casablanca' and 'Run Lola Run', including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures. Performance is also compared to a baseline method implementing standard frame to frame matching.
first_indexed	2024-03-07T06:10:50Z
format	Journal article
id	oxford-uuid:ef6ef3f0-aed8-4bdf-a6fb-52405c23401d
institution	University of Oxford
language	English
last_indexed	2025-02-19T04:33:58Z
publishDate	2008
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:ef6ef3f0-aed8-4bdf-a6fb-52405c23401d2025-01-17T15:33:19ZEfficient visual search of videos cast as text retrievalJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:ef6ef3f0-aed8-4bdf-a6fb-52405c23401dEnglishSymplectic Elements at OxfordIEEE2008Sivic, JZisserman, AWe describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google. We report results for object retrieval on the full length feature films 'Groundhog Day', 'Casablanca' and 'Run Lola Run', including searches from within the movie and specified by external images downloaded from the Internet. We investigate retrieval performance with respect to different quantizations of region descriptors and compare the performance of several ranking measures. Performance is also compared to a baseline method implementing standard frame to frame matching.
spellingShingle	Sivic, J Zisserman, A Efficient visual search of videos cast as text retrieval
title	Efficient visual search of videos cast as text retrieval
title_full	Efficient visual search of videos cast as text retrieval
title_fullStr	Efficient visual search of videos cast as text retrieval
title_full_unstemmed	Efficient visual search of videos cast as text retrieval
title_short	Efficient visual search of videos cast as text retrieval
title_sort	efficient visual search of videos cast as text retrieval
work_keys_str_mv	AT sivicj efficientvisualsearchofvideoscastastextretrieval AT zissermana efficientvisualsearchofvideoscastastextretrieval

Efficient visual search of videos cast as text retrieval

Podobné jednotky