Generating videos with scene dynamics

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional...

Full description

Bibliographic Details
Main Authors: Vondrick, Carl, Pirsiavash, Hamed, Torralba, Antonio
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: 2020
Online Access:https://hdl.handle.net/1721.1/124545
_version_ 1811091920779214848
author Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
author_sort Vondrick, Carl
collection MIT
description We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation. ©2016 Presented at a poster session of the Conference on Neural Information Processing Systems (NIPS 2016), December 5-10, 2016, Barcelona, Spain
first_indexed 2024-09-23T15:10:03Z
format Article
id mit-1721.1/124545
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:10:03Z
publishDate 2020
record_format dspace
spelling mit-1721.1/1245452022-10-02T01:04:24Z Generating videos with scene dynamics Vondrick, Carl Pirsiavash, Hamed Torralba, Antonio Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation. ©2016 Presented at a poster session of the Conference on Neural Information Processing Systems (NIPS 2016), December 5-10, 2016, Barcelona, Spain NSF (grant no. 1524817) 2020-04-08T17:36:58Z 2020-04-08T17:36:58Z 2016 2016-12 2019-07-11T16:00:10Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/124545 Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba, "Generating videos with scene dynamics." Advances in Neural Information Processing Systems 29 (2016) url https://papers.nips.cc/paper/6194-generating-videos-with-scene-dynamics ©2016 Author(s) en https://papers.nips.cc/paper/6194-generating-videos-with-scene-dynamics Advances in Neural Information Processing Systems Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems (NIPS)
spellingShingle Vondrick, Carl
Pirsiavash, Hamed
Torralba, Antonio
Generating videos with scene dynamics
title Generating videos with scene dynamics
title_full Generating videos with scene dynamics
title_fullStr Generating videos with scene dynamics
title_full_unstemmed Generating videos with scene dynamics
title_short Generating videos with scene dynamics
title_sort generating videos with scene dynamics
url https://hdl.handle.net/1721.1/124545
work_keys_str_mv AT vondrickcarl generatingvideoswithscenedynamics
AT pirsiavashhamed generatingvideoswithscenedynamics
AT torralbaantonio generatingvideoswithscenedynamics