The Sound of Motions

© 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and se...

Full description

Bibliographic Details
Main Authors: Zhao, Hang, Gan, Chuang, Ma, Wei-Chiu, Torralba, Antonio
Other Authors: MIT-IBM Watson AI Lab
Format: Article
Language:English
Published: IEEE 2021
Online Access:https://hdl.handle.net/1721.1/137169
_version_ 1811081598190223360
author Zhao, Hang
Gan, Chuang
Ma, Wei-Chiu
Torralba, Antonio
author2 MIT-IBM Watson AI Lab
author_facet MIT-IBM Watson AI Lab
Zhao, Hang
Gan, Chuang
Ma, Wei-Chiu
Torralba, Antonio
author_sort Zhao, Hang
collection MIT
description © 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before.
first_indexed 2024-09-23T11:49:20Z
format Article
id mit-1721.1/137169
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T11:49:20Z
publishDate 2021
publisher IEEE
record_format dspace
spelling mit-1721.1/1371692023-02-09T18:12:17Z The Sound of Motions Zhao, Hang Gan, Chuang Ma, Wei-Chiu Torralba, Antonio MIT-IBM Watson AI Lab Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before. 2021-11-02T18:55:26Z 2021-11-02T18:55:26Z 2019-10 2021-04-15T17:53:25Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137169 Zhao, Hang, Gan, Chuang, Ma, Wei-Chiu and Torralba, Antonio. 2019. "The Sound of Motions." Proceedings of the IEEE International Conference on Computer Vision, 2019-October. en 10.1109/iccv.2019.00182 Proceedings of the IEEE International Conference on Computer Vision Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf IEEE arXiv
spellingShingle Zhao, Hang
Gan, Chuang
Ma, Wei-Chiu
Torralba, Antonio
The Sound of Motions
title The Sound of Motions
title_full The Sound of Motions
title_fullStr The Sound of Motions
title_full_unstemmed The Sound of Motions
title_short The Sound of Motions
title_sort sound of motions
url https://hdl.handle.net/1721.1/137169
work_keys_str_mv AT zhaohang thesoundofmotions
AT ganchuang thesoundofmotions
AT maweichiu thesoundofmotions
AT torralbaantonio thesoundofmotions
AT zhaohang soundofmotions
AT ganchuang soundofmotions
AT maweichiu soundofmotions
AT torralbaantonio soundofmotions