Efficient visual search for objects in videos

We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by speci...

Full description

Bibliographic Details
Main Authors: Sivic, J, Zisserman, A
Format: Journal article
Language:English
Published: IEEE 2008
_version_ 1824458868518813696
author Sivic, J
Zisserman, A
author_facet Sivic, J
Zisserman, A
author_sort Sivic, J
collection OXFORD
description We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
first_indexed 2025-02-19T04:32:44Z
format Journal article
id oxford-uuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daa
institution University of Oxford
language English
last_indexed 2025-02-19T04:32:44Z
publishDate 2008
publisher IEEE
record_format dspace
spelling oxford-uuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daa2025-01-17T15:56:15ZEfficient visual search for objects in videosJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daaEnglishSymplectic ElementsIEEE2008Sivic, JZisserman, AWe describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
spellingShingle Sivic, J
Zisserman, A
Efficient visual search for objects in videos
title Efficient visual search for objects in videos
title_full Efficient visual search for objects in videos
title_fullStr Efficient visual search for objects in videos
title_full_unstemmed Efficient visual search for objects in videos
title_short Efficient visual search for objects in videos
title_sort efficient visual search for objects in videos
work_keys_str_mv AT sivicj efficientvisualsearchforobjectsinvideos
AT zissermana efficientvisualsearchforobjectsinvideos