Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
स्वरूप:	Conference item
भाषा:	English
प्रकाशित:	IEEE 2024

समान संसाधन

Multi-task self-supervised visual learning
द्वारा: Doersch, C, और अन्य
प्रकाशित: (2017)

Ambient Sound Provides Supervision for Visual Learning
द्वारा: Owens, Andrew Hale, और अन्य
प्रकाशित: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
द्वारा: Owens, Andrew, और अन्य
प्रकाशित: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
द्वारा: Tomoya Sato, और अन्य
प्रकाशित: (2022-01-01)

Self-supervised learning of audio-visual objects from video
द्वारा: Afouras, T, और अन्य
प्रकाशित: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
द्वारा: A. Cebrecos, और अन्य
प्रकाशित: (2014-12-01)

Music Gesture for Visual Sound Separation
द्वारा: Gan, Chuang, और अन्य
प्रकाशित: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
द्वारा: M.A. Sergeeva, और अन्य
प्रकाशित: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
द्वारा: Ponomarchuk S.N., और अन्य
प्रकाशित: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
द्वारा: Akiyoshi Kurobe, और अन्य
प्रकाशित: (2021-01-01)

Self-supervised learning for spinal MRIs
द्वारा: Jamaludin, A, और अन्य
प्रकाशित: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
द्वारा: Gan, Chuang, और अन्य
प्रकाशित: (2021)

Unsupervised discovery of visual object class hierarchies
द्वारा: Sivic, J, और अन्य
प्रकाशित: (2008)

Localizing visual sounds the hard way
द्वारा: Vedaldi, A, और अन्य
प्रकाशित: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
द्वारा: Fergus, R, और अन्य
प्रकाशित: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
द्वारा: Yansuo Yu, और अन्य
प्रकाशित: (2024-01-01)

Self-supervised co-training for video representation learning
द्वारा: Han, T, और अन्य
प्रकाशित: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
द्वारा: Podlesnyi A.V., और अन्य
प्रकाशित: (2019-12-01)

Self-supervised learning of class embeddings from video
द्वारा: Wiles, O, और अन्य
प्रकाशित: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
द्वारा: Koepke, S, और अन्य
प्रकाशित: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
द्वारा: Dewei Kong, और अन्य
प्रकाशित: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
द्वारा: Sungho Shin, और अन्य
प्रकाशित: (2021-03-01)

Visually Indicated Sounds
द्वारा: Isola, Phillip, और अन्य
प्रकाशित: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
द्वारा: Zihui Yang, और अन्य
प्रकाशित: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
द्वारा: Aurimas Mockevičius, और अन्य
प्रकाशित: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
द्वारा: Paula Maddigan, और अन्य
प्रकाशित: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
द्वारा: Nagrani, A, और अन्य
प्रकाशित: (2020)

Self-Supervised Audio-Visual Co-Segmentation
द्वारा: Rouditchenko, Andrew, और अन्य
प्रकाशित: (2022)

Self-Supervised Audio-Visual Co-Segmentation
द्वारा: Rouditchenko, Andrew, और अन्य
प्रकाशित: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
द्वारा: Alexander Bauer, और अन्य
प्रकाशित: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
द्वारा: Oliver Roesler
प्रकाशित: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
द्वारा: Prajwal, KR, और अन्य
प्रकाशित: (2022)

Self-supervised learning of a facial attribute embedding from video
द्वारा: Wiles, O, और अन्य
प्रकाशित: (2018)

Self-supervised video object segmentation by motion grouping
द्वारा: Yang, C, और अन्य
प्रकाशित: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
द्वारा: L. Larrabee Strow, और अन्य
प्रकाशित: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
द्वारा: Ponomarchuk S.N., और अन्य
प्रकाशित: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
द्वारा: Junyuan Wang, और अन्य
प्रकाशित: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
द्वारा: Windsor, R, और अन्य
प्रकाशित: (2021)

Now you're speaking my language: visual language identification
द्वारा: Afouras, T, और अन्य
प्रकाशित: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
द्वारा: Zixi Li, और अन्य
प्रकाशित: (2024-09-01)