Text-to-4D dynamic scene generation

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which i...

Full description

Bibliographic Details
Main Authors:	Singer, U, Sheynin, S, Polyak, A, Ashual, O, Makarov, I, Kokkinos, F, Goyal, N, Vedaldi, A, Parikh, D, Johnson, J, Taigman, Y
Format:	Conference item
Language:	English
Published:	Proceedings of Machine Learning Research 2023

_version_	1826311368997863424
author	Singer, U Sheynin, S Polyak, A Ashual, O Makarov, I Kokkinos, F Goyal, N Vedaldi, A Parikh, D Johnson, J Taigman, Y
author_facet	Singer, U Sheynin, S Polyak, A Ashual, O Makarov, I Kokkinos, F Goyal, N Vedaldi, A Parikh, D Johnson, J Taigman, Y
author_sort	Singer, U
collection	OXFORD
description	We present MAV3D (<strong>M</strong>ake-<strong>A</strong>-<strong>V</strong>ideo<strong>3D</strong>), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description. Generated samples can be viewed at make-a-video3d.github.io
first_indexed	2024-03-07T08:07:16Z
format	Conference item
id	oxford-uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4c
institution	University of Oxford
language	English
last_indexed	2024-03-07T08:07:16Z
publishDate	2023
publisher	Proceedings of Machine Learning Research
record_format	dspace
spelling	oxford-uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4c2023-11-03T12:38:01ZText-to-4D dynamic scene generationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:de98f139-bcfd-44f5-8cb3-1e44ae868f4cEnglishSymplectic ElementsProceedings of Machine Learning Research2023Singer, USheynin, SPolyak, AAshual, OMakarov, IKokkinos, FGoyal, NVedaldi, AParikh, DJohnson, JTaigman, YWe present MAV3D (<strong>M</strong>ake-<strong>A</strong>-<strong>V</strong>ideo<strong>3D</strong>), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description. Generated samples can be viewed at make-a-video3d.github.io
spellingShingle	Singer, U Sheynin, S Polyak, A Ashual, O Makarov, I Kokkinos, F Goyal, N Vedaldi, A Parikh, D Johnson, J Taigman, Y Text-to-4D dynamic scene generation
title	Text-to-4D dynamic scene generation
title_full	Text-to-4D dynamic scene generation
title_fullStr	Text-to-4D dynamic scene generation
title_full_unstemmed	Text-to-4D dynamic scene generation
title_short	Text-to-4D dynamic scene generation
title_sort	text to 4d dynamic scene generation
work_keys_str_mv	AT singeru textto4ddynamicscenegeneration AT sheynins textto4ddynamicscenegeneration AT polyaka textto4ddynamicscenegeneration AT ashualo textto4ddynamicscenegeneration AT makarovi textto4ddynamicscenegeneration AT kokkinosf textto4ddynamicscenegeneration AT goyaln textto4ddynamicscenegeneration AT vedaldia textto4ddynamicscenegeneration AT parikhd textto4ddynamicscenegeneration AT johnsonj textto4ddynamicscenegeneration AT taigmany textto4ddynamicscenegeneration

Text-to-4D dynamic scene generation

Similar Items