TAPIR: tracking any point with per-frame initialization and temporal refinement

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other...

Ausführliche Beschreibung

Bibliographische Detailangaben
Hauptverfasser: Doersch, C, Yang, Y, Vecerik, M, Gokay, D, Gupta, A, Aytar, Y, Carreira, J, Zisserman, A
Format: Conference item
Sprache:English
Veröffentlicht: IEEE 2024
_version_ 1826313199029321728
author Doersch, C
Yang, Y
Vecerik, M
Gokay, D
Gupta, A
Aytar, Y
Carreira, J
Zisserman, A
author_facet Doersch, C
Yang, Y
Vecerik, M
Gokay, D
Gupta, A
Aytar, Y
Carreira, J
Zisserman, A
author_sort Doersch, C
collection OXFORD
description We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io.
first_indexed 2024-09-25T04:09:21Z
format Conference item
id oxford-uuid:b36873c1-ea87-40d7-9e29-7d36c67669ef
institution University of Oxford
language English
last_indexed 2024-09-25T04:09:21Z
publishDate 2024
publisher IEEE
record_format dspace
spelling oxford-uuid:b36873c1-ea87-40d7-9e29-7d36c67669ef2024-06-13T14:14:28ZTAPIR: tracking any point with per-frame initialization and temporal refinementConference itemhttp://purl.org/coar/resource_type/c_5794uuid:b36873c1-ea87-40d7-9e29-7d36c67669efEnglishSymplectic ElementsIEEE2024Doersch, CYang, YVecerik, MGokay, DGupta, AAytar, YCarreira, JZisserman, AWe present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io.
spellingShingle Doersch, C
Yang, Y
Vecerik, M
Gokay, D
Gupta, A
Aytar, Y
Carreira, J
Zisserman, A
TAPIR: tracking any point with per-frame initialization and temporal refinement
title TAPIR: tracking any point with per-frame initialization and temporal refinement
title_full TAPIR: tracking any point with per-frame initialization and temporal refinement
title_fullStr TAPIR: tracking any point with per-frame initialization and temporal refinement
title_full_unstemmed TAPIR: tracking any point with per-frame initialization and temporal refinement
title_short TAPIR: tracking any point with per-frame initialization and temporal refinement
title_sort tapir tracking any point with per frame initialization and temporal refinement
work_keys_str_mv AT doerschc tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT yangy tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT vecerikm tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT gokayd tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT guptaa tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT aytary tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT carreiraj tapirtrackinganypointwithperframeinitializationandtemporalrefinement
AT zissermana tapirtrackinganypointwithperframeinitializationandtemporalrefinement