TAPIR: tracking any point with per-frame initialization and temporal refinement
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other...
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Conference item |
Sprache: | English |
Veröffentlicht: |
IEEE
2024
|
_version_ | 1826313199029321728 |
---|---|
author | Doersch, C Yang, Y Vecerik, M Gokay, D Gupta, A Aytar, Y Carreira, J Zisserman, A |
author_facet | Doersch, C Yang, Y Vecerik, M Gokay, D Gupta, A Aytar, Y Carreira, J Zisserman, A |
author_sort | Doersch, C |
collection | OXFORD |
description | We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io. |
first_indexed | 2024-09-25T04:09:21Z |
format | Conference item |
id | oxford-uuid:b36873c1-ea87-40d7-9e29-7d36c67669ef |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:09:21Z |
publishDate | 2024 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:b36873c1-ea87-40d7-9e29-7d36c67669ef2024-06-13T14:14:28ZTAPIR: tracking any point with per-frame initialization and temporal refinementConference itemhttp://purl.org/coar/resource_type/c_5794uuid:b36873c1-ea87-40d7-9e29-7d36c67669efEnglishSymplectic ElementsIEEE2024Doersch, CYang, YVecerik, MGokay, DGupta, AAytar, YCarreira, JZisserman, AWe present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io. |
spellingShingle | Doersch, C Yang, Y Vecerik, M Gokay, D Gupta, A Aytar, Y Carreira, J Zisserman, A TAPIR: tracking any point with per-frame initialization and temporal refinement |
title | TAPIR: tracking any point with per-frame initialization and temporal refinement |
title_full | TAPIR: tracking any point with per-frame initialization and temporal refinement |
title_fullStr | TAPIR: tracking any point with per-frame initialization and temporal refinement |
title_full_unstemmed | TAPIR: tracking any point with per-frame initialization and temporal refinement |
title_short | TAPIR: tracking any point with per-frame initialization and temporal refinement |
title_sort | tapir tracking any point with per frame initialization and temporal refinement |
work_keys_str_mv | AT doerschc tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT yangy tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT vecerikm tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT gokayd tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT guptaa tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT aytary tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT carreiraj tapirtrackinganypointwithperframeinitializationandtemporalrefinement AT zissermana tapirtrackinganypointwithperframeinitializationandtemporalrefinement |