Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
Formáid:	Conference item
Teanga:	English
Foilsithe / Cruthaithe:	IEEE 2024

Míreanna comhchosúla

Multi-task self-supervised visual learning
de réir: Doersch, C, et al.
Foilsithe / Cruthaithe: (2017)

Ambient Sound Provides Supervision for Visual Learning
de réir: Owens, Andrew Hale, et al.
Foilsithe / Cruthaithe: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
de réir: Owens, Andrew, et al.
Foilsithe / Cruthaithe: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
de réir: Tomoya Sato, et al.
Foilsithe / Cruthaithe: (2022-01-01)

Self-supervised learning of audio-visual objects from video
de réir: Afouras, T, et al.
Foilsithe / Cruthaithe: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
de réir: A. Cebrecos, et al.
Foilsithe / Cruthaithe: (2014-12-01)

Music Gesture for Visual Sound Separation
de réir: Gan, Chuang, et al.
Foilsithe / Cruthaithe: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
de réir: M.A. Sergeeva, et al.
Foilsithe / Cruthaithe: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
de réir: Ponomarchuk S.N., et al.
Foilsithe / Cruthaithe: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
de réir: Akiyoshi Kurobe, et al.
Foilsithe / Cruthaithe: (2021-01-01)

Self-supervised learning for spinal MRIs
de réir: Jamaludin, A, et al.
Foilsithe / Cruthaithe: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
de réir: Gan, Chuang, et al.
Foilsithe / Cruthaithe: (2021)

Unsupervised discovery of visual object class hierarchies
de réir: Sivic, J, et al.
Foilsithe / Cruthaithe: (2008)

Localizing visual sounds the hard way
de réir: Vedaldi, A, et al.
Foilsithe / Cruthaithe: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
de réir: Fergus, R, et al.
Foilsithe / Cruthaithe: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
de réir: Yansuo Yu, et al.
Foilsithe / Cruthaithe: (2024-01-01)

Self-supervised co-training for video representation learning
de réir: Han, T, et al.
Foilsithe / Cruthaithe: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
de réir: Podlesnyi A.V., et al.
Foilsithe / Cruthaithe: (2019-12-01)

Self-supervised learning of class embeddings from video
de réir: Wiles, O, et al.
Foilsithe / Cruthaithe: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
de réir: Koepke, S, et al.
Foilsithe / Cruthaithe: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
de réir: Dewei Kong, et al.
Foilsithe / Cruthaithe: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
de réir: Sungho Shin, et al.
Foilsithe / Cruthaithe: (2021-03-01)

Visually Indicated Sounds
de réir: Isola, Phillip, et al.
Foilsithe / Cruthaithe: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
de réir: Zihui Yang, et al.
Foilsithe / Cruthaithe: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
de réir: Aurimas Mockevičius, et al.
Foilsithe / Cruthaithe: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
de réir: Paula Maddigan, et al.
Foilsithe / Cruthaithe: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
de réir: Nagrani, A, et al.
Foilsithe / Cruthaithe: (2020)

Self-Supervised Audio-Visual Co-Segmentation
de réir: Rouditchenko, Andrew, et al.
Foilsithe / Cruthaithe: (2022)

Self-Supervised Audio-Visual Co-Segmentation
de réir: Rouditchenko, Andrew, et al.
Foilsithe / Cruthaithe: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
de réir: Alexander Bauer, et al.
Foilsithe / Cruthaithe: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
de réir: Oliver Roesler
Foilsithe / Cruthaithe: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
de réir: Prajwal, KR, et al.
Foilsithe / Cruthaithe: (2022)

Self-supervised learning of a facial attribute embedding from video
de réir: Wiles, O, et al.
Foilsithe / Cruthaithe: (2018)

Self-supervised video object segmentation by motion grouping
de réir: Yang, C, et al.
Foilsithe / Cruthaithe: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
de réir: L. Larrabee Strow, et al.
Foilsithe / Cruthaithe: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
de réir: Ponomarchuk S.N., et al.
Foilsithe / Cruthaithe: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
de réir: Junyuan Wang, et al.
Foilsithe / Cruthaithe: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
de réir: Windsor, R, et al.
Foilsithe / Cruthaithe: (2021)

Now you're speaking my language: visual language identification
de réir: Afouras, T, et al.
Foilsithe / Cruthaithe: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
de réir: Zixi Li, et al.
Foilsithe / Cruthaithe: (2024-09-01)