Text-to-4D dynamic scene generation

We present MAV3D (<strong>M</strong>ake-<strong>A</strong>-<strong>V</strong>ideo<strong>3D</strong>), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which i...

Full description

Bibliographic Details
Main Authors: Singer, U, Sheynin, S, Polyak, A, Ashual, O, Makarov, I, Kokkinos, F, Goyal, N, Vedaldi, A, Parikh, D, Johnson, J, Taigman, Y
Format: Conference item
Language:English
Published: Proceedings of Machine Learning Research 2023
_version_ 1826311368997863424
author Singer, U
Sheynin, S
Polyak, A
Ashual, O
Makarov, I
Kokkinos, F
Goyal, N
Vedaldi, A
Parikh, D
Johnson, J
Taigman, Y
author_facet Singer, U
Sheynin, S
Polyak, A
Ashual, O
Makarov, I
Kokkinos, F
Goyal, N
Vedaldi, A
Parikh, D
Johnson, J
Taigman, Y
author_sort Singer, U
collection OXFORD
description We present MAV3D (<strong>M</strong>ake-<strong>A</strong>-<strong>V</strong>ideo<strong>3D</strong>), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description. Generated samples can be viewed at make-a-video3d.github.io
first_indexed 2024-03-07T08:07:16Z
format Conference item
id oxford-uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4c
institution University of Oxford
language English
last_indexed 2024-03-07T08:07:16Z
publishDate 2023
publisher Proceedings of Machine Learning Research
record_format dspace
spelling oxford-uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4c2023-11-03T12:38:01ZText-to-4D dynamic scene generationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4cEnglishSymplectic ElementsProceedings of Machine Learning Research2023Singer, USheynin, SPolyak, AAshual, OMakarov, IKokkinos, FGoyal, NVedaldi, AParikh, DJohnson, JTaigman, YWe present MAV3D (<strong>M</strong>ake-<strong>A</strong>-<strong>V</strong>ideo<strong>3D</strong>), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description. Generated samples can be viewed at make-a-video3d.github.io
spellingShingle Singer, U
Sheynin, S
Polyak, A
Ashual, O
Makarov, I
Kokkinos, F
Goyal, N
Vedaldi, A
Parikh, D
Johnson, J
Taigman, Y
Text-to-4D dynamic scene generation
title Text-to-4D dynamic scene generation
title_full Text-to-4D dynamic scene generation
title_fullStr Text-to-4D dynamic scene generation
title_full_unstemmed Text-to-4D dynamic scene generation
title_short Text-to-4D dynamic scene generation
title_sort text to 4d dynamic scene generation
work_keys_str_mv AT singeru textto4ddynamicscenegeneration
AT sheynins textto4ddynamicscenegeneration
AT polyaka textto4ddynamicscenegeneration
AT ashualo textto4ddynamicscenegeneration
AT makarovi textto4ddynamicscenegeneration
AT kokkinosf textto4ddynamicscenegeneration
AT goyaln textto4ddynamicscenegeneration
AT vedaldia textto4ddynamicscenegeneration
AT parikhd textto4ddynamicscenegeneration
AT johnsonj textto4ddynamicscenegeneration
AT taigmany textto4ddynamicscenegeneration