Efficient visual search for objects in videos

We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by speci...

תיאור מלא

מידע ביבליוגרפי
Main Authors:	Sivic, J, Zisserman, A
פורמט:	Journal article
שפה:	English
יצא לאור:	IEEE 2008

_version_	1826317070952824832
author	Sivic, J Zisserman, A
author_facet	Sivic, J Zisserman, A
author_sort	Sivic, J
collection	OXFORD
description	We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
first_indexed	2025-02-19T04:32:44Z
format	Journal article
id	oxford-uuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daa
institution	University of Oxford
language	English
last_indexed	2025-02-19T04:32:44Z
publishDate	2008
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daa2025-01-17T15:56:15ZEfficient visual search for objects in videosJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:26a8f32b-2a4f-4c34-b1b0-ad1c569e8daaEnglishSymplectic ElementsIEEE2008Sivic, JZisserman, AWe describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.
spellingShingle	Sivic, J Zisserman, A Efficient visual search for objects in videos
title	Efficient visual search for objects in videos
title_full	Efficient visual search for objects in videos
title_fullStr	Efficient visual search for objects in videos
title_full_unstemmed	Efficient visual search for objects in videos
title_short	Efficient visual search for objects in videos
title_sort	efficient visual search for objects in videos
work_keys_str_mv	AT sivicj efficientvisualsearchforobjectsinvideos AT zissermana efficientvisualsearchforobjectsinvideos

Efficient visual search for objects in videos

פריטים דומים