Shape and material from sound

Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simula...

Full description

Bibliographic Details
Main Authors:	Zhang, Zhoutong, Li, Qiujia, Huang, Zhengjia, Wu, Jiajun, Tenenbaum, Joshua B., Freeman, William T.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	Neural Information Processing Systems Foundation 2020
Online Access:	https://hdl.handle.net/1721.1/124779

_version_	1826213423357100032
author	Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T.
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T.
author_sort	Zhang, Zhoutong
collection	MIT
description	Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance.
first_indexed	2024-09-23T15:48:51Z
format	Article
id	mit-1721.1/124779
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T15:48:51Z
publishDate	2020
publisher	Neural Information Processing Systems Foundation
record_format	dspace
spelling	mit-1721.1/1247792022-09-29T16:20:00Z Shape and material from sound Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height. In this paper, we build machines to approximate such competency. We first mimic human knowledge of the physical world by building an efficient, physics-based simulation engine. Then, we present an analysis-by-synthesis approach to infer properties of the falling object. We further accelerate the process by learning a mapping from a sound wave to object properties, and using the predicted values to initialize the inference. This mapping can be viewed as an approximation of human commonsense learned from past experience. Our model performs well on both synthetic audio clips and real recordings without requiring any annotated data. We conduct behavior studies to compare human responses with ours on estimating object shape, material, and falling height from sound. Our model achieves near-human performance. National Science Foundation (U.S.) (1212849) National Science Foundation (U.S.) (1447476) United States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-16-1-2007) Toyota Research Institute Samsung (Firm) Shell National Science Foundation (U.S.) (STC Award CCF-1231216) 2020-04-22T02:40:25Z 2020-04-22T02:40:25Z 2017 2019-05-28T12:50:08Z Article http://purl.org/eprint/type/ConferencePaper 1049-5258 https://hdl.handle.net/1721.1/124779 Zhang, Zhoutong et al. "Shape and material from sound." Advances in Neural Information Processing Systems (NIPS 2017), 30 (2017). © 2017 Neural Information processing systems foundation en https://papers.nips.cc/paper/6727-shape-and-material-from-sound Advances in Neural Information Processing Systems Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation Neural Information Processing Systems (NIPS)
spellingShingle	Zhang, Zhoutong Li, Qiujia Huang, Zhengjia Wu, Jiajun Tenenbaum, Joshua B. Freeman, William T. Shape and material from sound
title	Shape and material from sound
title_full	Shape and material from sound
title_fullStr	Shape and material from sound
title_full_unstemmed	Shape and material from sound
title_short	Shape and material from sound
title_sort	shape and material from sound
url	https://hdl.handle.net/1721.1/124779
work_keys_str_mv	AT zhangzhoutong shapeandmaterialfromsound AT liqiujia shapeandmaterialfromsound AT huangzhengjia shapeandmaterialfromsound AT wujiajun shapeandmaterialfromsound AT tenenbaumjoshuab shapeandmaterialfromsound AT freemanwilliamt shapeandmaterialfromsound

Shape and material from sound

Similar Items