Objects that sound
In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the audio signal. We achieve both these objectives by training fr...
المؤلفون الرئيسيون: | , |
---|---|
التنسيق: | Conference item |
اللغة: | English |
منشور في: |
Springer
2018
|