The Sound of Motions
© 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and se...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
IEEE
2021
|
Online Access: | https://hdl.handle.net/1721.1/137169 |
_version_ | 1811081598190223360 |
---|---|
author | Zhao, Hang Gan, Chuang Ma, Wei-Chiu Torralba, Antonio |
author2 | MIT-IBM Watson AI Lab |
author_facet | MIT-IBM Watson AI Lab Zhao, Hang Gan, Chuang Ma, Wei-Chiu Torralba, Antonio |
author_sort | Zhao, Hang |
collection | MIT |
description | © 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before. |
first_indexed | 2024-09-23T11:49:20Z |
format | Article |
id | mit-1721.1/137169 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T11:49:20Z |
publishDate | 2021 |
publisher | IEEE |
record_format | dspace |
spelling | mit-1721.1/1371692023-02-09T18:12:17Z The Sound of Motions Zhao, Hang Gan, Chuang Ma, Wei-Chiu Torralba, Antonio MIT-IBM Watson AI Lab Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2019 IEEE. Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before. 2021-11-02T18:55:26Z 2021-11-02T18:55:26Z 2019-10 2021-04-15T17:53:25Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137169 Zhao, Hang, Gan, Chuang, Ma, Wei-Chiu and Torralba, Antonio. 2019. "The Sound of Motions." Proceedings of the IEEE International Conference on Computer Vision, 2019-October. en 10.1109/iccv.2019.00182 Proceedings of the IEEE International Conference on Computer Vision Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf IEEE arXiv |
spellingShingle | Zhao, Hang Gan, Chuang Ma, Wei-Chiu Torralba, Antonio The Sound of Motions |
title | The Sound of Motions |
title_full | The Sound of Motions |
title_fullStr | The Sound of Motions |
title_full_unstemmed | The Sound of Motions |
title_short | The Sound of Motions |
title_sort | sound of motions |
url | https://hdl.handle.net/1721.1/137169 |
work_keys_str_mv | AT zhaohang thesoundofmotions AT ganchuang thesoundofmotions AT maweichiu thesoundofmotions AT torralbaantonio thesoundofmotions AT zhaohang soundofmotions AT ganchuang soundofmotions AT maweichiu soundofmotions AT torralbaantonio soundofmotions |