Shape and material from sound

Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simula...

Full description

Bibliographic Details
Main Authors: Zhang, Zhoutong, Li, Qiujia, Huang, Zhengjia, Wu, Jiajun, Tenenbaum, Joshua B., Freeman, William T.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Neural Information Processing Systems Foundation 2020
Online Access:https://hdl.handle.net/1721.1/124779
_version_ 1826213423357100032
author Zhang, Zhoutong
Li, Qiujia
Huang, Zhengjia
Wu, Jiajun
Tenenbaum, Joshua B.
Freeman, William T.
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Zhang, Zhoutong
Li, Qiujia
Huang, Zhengjia
Wu, Jiajun
Tenenbaum, Joshua B.
Freeman, William T.
author_sort Zhang, Zhoutong
collection MIT
description Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance.
first_indexed 2024-09-23T15:48:51Z
format Article
id mit-1721.1/124779
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:48:51Z
publishDate 2020
publisher Neural Information Processing Systems Foundation
record_format dspace
spelling mit-1721.1/1247792022-09-29T16:20:00Z Shape and material from sound Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance. National Science Foundation (U.S.) (1212849) National Science Foundation (U.S.) (1447476) United States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-16-1-2007) Toyota Research Institute Samsung (Firm) Shell National Science Foundation (U.S.) (STC Award CCF-1231216) 2020-04-22T02:40:25Z 2020-04-22T02:40:25Z 2017 2019-05-28T12:50:08Z Article http://purl.org/eprint/type/ConferencePaper 1049-5258 https://hdl.handle.net/1721.1/124779 Zhang, Zhoutong et al. "Shape and material from sound." Advances in Neural Information Processing Systems (NIPS 2017), 30 (2017). © 2017 Neural Information processing systems foundation en https://papers.nips.cc/paper/6727-shape-and-material-from-sound Advances in Neural Information Processing Systems Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation Neural Information Processing Systems (NIPS)
spellingShingle Zhang, Zhoutong
Li, Qiujia
Huang, Zhengjia
Wu, Jiajun
Tenenbaum, Joshua B.
Freeman, William T.
Shape and material from sound
title Shape and material from sound
title_full Shape and material from sound
title_fullStr Shape and material from sound
title_full_unstemmed Shape and material from sound
title_short Shape and material from sound
title_sort shape and material from sound
url https://hdl.handle.net/1721.1/124779
work_keys_str_mv AT zhangzhoutong shapeandmaterialfromsound
AT liqiujia shapeandmaterialfromsound
AT huangzhengjia shapeandmaterialfromsound
AT wujiajun shapeandmaterialfromsound
AT tenenbaumjoshuab shapeandmaterialfromsound
AT freemanwilliamt shapeandmaterialfromsound