Towards Longer Long-Range Motion Trajectories

Although dense, long-range, motion trajectories are a prominent representation of motion in videos, there is still no good solution for constructing dense motion tracks in a truly long-range fashion. Ideally, we would want every scene feature that appears in multiple, not necessarily contiguous,...

Full description

Bibliographic Details
Main Authors: Rubinstein, Michael, Liu, Ce, Freeman, William T.
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: British Machine Vision Association 2015
Online Access:http://hdl.handle.net/1721.1/100283
https://orcid.org/0000-0002-3707-3807
https://orcid.org/0000-0002-2231-7995
Description
Summary:Although dense, long-range, motion trajectories are a prominent representation of motion in videos, there is still no good solution for constructing dense motion tracks in a truly long-range fashion. Ideally, we would want every scene feature that appears in multiple, not necessarily contiguous, parts of the sequence to be associated with the same motion track. Despite this reasonable and clearly stated objective, there has been surprisingly little work on general-purpose algorithms that can accomplish this task. State-of-the-art dense motion trackers process the sequence incrementally in a frame-by-frame manner, and associate, by design, features that disappear and reappear in the video, with different tracks, thereby losing important information of the long-term motion signal. In this paper, we strive towards an algorithm for producing generic long-range motion trajectories that are robust to occlusion, deformation and camera motion. We leverage accurate local (short-range) trajectories produced by current motion tracking methods and use them as an initial estimate for a global (long-range) solution. Our algorithm re-correlates the short trajectories and links them to form a long-range motion representation by formulating a combinatorial assignment problem that is defined and optimized globally over the entire sequence. This allows to correlate features in arbitrarily distinct parts of the sequence, as well as handle tracking ambiguities by spatiotemporal regularization. We report the results of the algorithm on both synthetic and natural videos, and evaluate the long-range motion representation for action recognition.