Foley Music: Learning to Generate Music from Videos

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from au...

Full description

Bibliographic Details
Main Authors:	Gan, Chuang, Huang, Deng, Chen, Peihao, Tenenbaum, Joshua B, Torralba, Antonio
Other Authors:	MIT-IBM Watson AI Lab
Format:	Book
Language:	English
Published:	Springer International Publishing 2021
Online Access:	https://hdl.handle.net/1721.1/130350

_version_	1811078488654872576
author	Gan, Chuang Huang, Deng Chen, Peihao Tenenbaum, Joshua B Torralba, Antonio
author2	MIT-IBM Watson AI Lab
author_facet	MIT-IBM Watson AI Lab Gan, Chuang Huang, Deng Chen, Peihao Tenenbaum, Joshua B Torralba, Antonio
author_sort	Gan, Chuang
collection	MIT
description	In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph−Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the supplementary video with audio turned on to experience the results.
first_indexed	2024-09-23T11:00:45Z
format	Book
id	mit-1721.1/130350
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T11:00:45Z
publishDate	2021
publisher	Springer International Publishing
record_format	dspace
spelling	mit-1721.1/1303502022-09-27T16:31:28Z Foley Music: Learning to Generate Music from Videos Gan, Chuang Huang, Deng Chen, Peihao Tenenbaum, Joshua B Torralba, Antonio MIT-IBM Watson AI Lab Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph−Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the supplementary video with audio turned on to experience the results. ONR MURI (N00014-16-1-2007) 2021-04-02T14:22:06Z 2021-04-02T14:22:06Z 2020-11 2021-01-28T15:39:50Z Book http://purl.org/eprint/type/ConferencePaper 9783030586201 9783030586218 0302-9743 1611-3349 https://hdl.handle.net/1721.1/130350 Gan, Chuang et al. "Foley Music: Learning to Generate Music from Videos." ECCV: European Conference on Computer Vision, Lecture Notes in Computer Science, 12356, Springer International Publishing, 2020, 758-775. © 2020 Springer Nature Switzerland AG en http://dx.doi.org/10.1007/978-3-030-58621-8_44 Lecture Notes in Computer Science Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Springer International Publishing arXiv
spellingShingle	Gan, Chuang Huang, Deng Chen, Peihao Tenenbaum, Joshua B Torralba, Antonio Foley Music: Learning to Generate Music from Videos
title	Foley Music: Learning to Generate Music from Videos
title_full	Foley Music: Learning to Generate Music from Videos
title_fullStr	Foley Music: Learning to Generate Music from Videos
title_full_unstemmed	Foley Music: Learning to Generate Music from Videos
title_short	Foley Music: Learning to Generate Music from Videos
title_sort	foley music learning to generate music from videos
url	https://hdl.handle.net/1721.1/130350
work_keys_str_mv	AT ganchuang foleymusiclearningtogeneratemusicfromvideos AT huangdeng foleymusiclearningtogeneratemusicfromvideos AT chenpeihao foleymusiclearningtogeneratemusicfromvideos AT tenenbaumjoshuab foleymusiclearningtogeneratemusicfromvideos AT torralbaantonio foleymusiclearningtogeneratemusicfromvideos

Foley Music: Learning to Generate Music from Videos

Similar Items