Shape and material from sound
Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simula...
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Neural Information Processing Systems Foundation
2020
|
Online Access: | https://hdl.handle.net/1721.1/124779 |
_version_ | 1826213423357100032 |
---|---|
author | Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. |
author_sort | Zhang, Zhoutong |
collection | MIT |
description | Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance. |
first_indexed | 2024-09-23T15:48:51Z |
format | Article |
id | mit-1721.1/124779 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T15:48:51Z |
publishDate | 2020 |
publisher | Neural Information Processing Systems Foundation |
record_format | dspace |
spelling | mit-1721.1/1247792022-09-29T16:20:00Z Shape and material from sound Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance. National Science Foundation (U.S.) (1212849) National Science Foundation (U.S.) (1447476) United States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-16-1-2007) Toyota Research Institute Samsung (Firm) Shell National Science Foundation (U.S.) (STC Award CCF-1231216) 2020-04-22T02:40:25Z 2020-04-22T02:40:25Z 2017 2019-05-28T12:50:08Z Article http://purl.org/eprint/type/ConferencePaper 1049-5258 https://hdl.handle.net/1721.1/124779 Zhang, Zhoutong et al. "Shape and material from sound." Advances in Neural Information Processing Systems (NIPS 2017), 30 (2017). © 2017 Neural Information processing systems foundation en https://papers.nips.cc/paper/6727-shape-and-material-from-sound Advances in Neural Information Processing Systems Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation Neural Information Processing Systems (NIPS) |
spellingShingle | Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. Shape and material from sound |
title | Shape and material from sound |
title_full | Shape and material from sound |
title_fullStr | Shape and material from sound |
title_full_unstemmed | Shape and material from sound |
title_short | Shape and material from sound |
title_sort | shape and material from sound |
url | https://hdl.handle.net/1721.1/124779 |
work_keys_str_mv | AT zhangzhoutong shapeandmaterialfromsound AT liqiujia shapeandmaterialfromsound AT huangzhengjia shapeandmaterialfromsound AT wujiajun shapeandmaterialfromsound AT tenenbaumjoshuab shapeandmaterialfromsound AT freemanwilliamt shapeandmaterialfromsound |