Text-conditioned resampler for long form video understanding
In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task. TCR localises relevant visual features from the video given a text condition and provides them to a LL...
Main Authors: | Korbar, B, Xian, Y, Tonioni, A, Zisserman, A, Tombari, F |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Springer
2024
|
Similar Items
SMOTETomek-Based Resampling for Personality Recognition
by: Zhe Wang, et al.
Published: (2019-01-01)
by: Zhe Wang, et al.
Published: (2019-01-01)
Similar Items
-
Personalised CLIP or: how to find your vacation videos
by: Korbar, B, et al.
Published: (2022) -
Video Google: a text retrieval approach to object matching in videos
by: Sivic, J, et al.
Published: (2003) -
Revealing Traces of Image Resampling and Resampling Antiforensics
by: Anjie Peng, et al.
Published: (2017-01-01) -
Resampling schemes with low resampling intensity and their applications in testing hypotheses
by: del Barrio, E, et al.
Published: (2009) -
Efficient visual search of videos cast as text retrieval
by: Sivic, J, et al.
Published: (2008)