Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

Deskribapen osoa

Xehetasun bibliografikoak
Egile Nagusiak:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
Formatua:	Conference item
Hizkuntza:	English
Argitaratua:	IEEE 2024

Antzeko izenburuak

Multi-task self-supervised visual learning
nork: Doersch, C, et al.
Argitaratua: (2017)

Ambient Sound Provides Supervision for Visual Learning
nork: Owens, Andrew Hale, et al.
Argitaratua: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
nork: Owens, Andrew, et al.
Argitaratua: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
nork: Tomoya Sato, et al.
Argitaratua: (2022-01-01)

Self-supervised learning of audio-visual objects from video
nork: Afouras, T, et al.
Argitaratua: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
nork: A. Cebrecos, et al.
Argitaratua: (2014-12-01)

Music Gesture for Visual Sound Separation
nork: Gan, Chuang, et al.
Argitaratua: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
nork: M.A. Sergeeva, et al.
Argitaratua: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
nork: Ponomarchuk S.N., et al.
Argitaratua: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
nork: Akiyoshi Kurobe, et al.
Argitaratua: (2021-01-01)

Self-supervised learning for spinal MRIs
nork: Jamaludin, A, et al.
Argitaratua: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
nork: Gan, Chuang, et al.
Argitaratua: (2021)

Unsupervised discovery of visual object class hierarchies
nork: Sivic, J, et al.
Argitaratua: (2008)

Localizing visual sounds the hard way
nork: Vedaldi, A, et al.
Argitaratua: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
nork: Fergus, R, et al.
Argitaratua: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
nork: Yansuo Yu, et al.
Argitaratua: (2024-01-01)

Self-supervised co-training for video representation learning
nork: Han, T, et al.
Argitaratua: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
nork: Podlesnyi A.V., et al.
Argitaratua: (2019-12-01)

Self-supervised learning of class embeddings from video
nork: Wiles, O, et al.
Argitaratua: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
nork: Koepke, S, et al.
Argitaratua: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
nork: Dewei Kong, et al.
Argitaratua: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
nork: Sungho Shin, et al.
Argitaratua: (2021-03-01)

Visually Indicated Sounds
nork: Isola, Phillip, et al.
Argitaratua: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
nork: Zihui Yang, et al.
Argitaratua: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
nork: Aurimas Mockevičius, et al.
Argitaratua: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
nork: Paula Maddigan, et al.
Argitaratua: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
nork: Nagrani, A, et al.
Argitaratua: (2020)

Self-Supervised Audio-Visual Co-Segmentation
nork: Rouditchenko, Andrew, et al.
Argitaratua: (2022)

Self-Supervised Audio-Visual Co-Segmentation
nork: Rouditchenko, Andrew, et al.
Argitaratua: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
nork: Alexander Bauer, et al.
Argitaratua: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
nork: Oliver Roesler
Argitaratua: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
nork: Prajwal, KR, et al.
Argitaratua: (2022)

Self-supervised learning of a facial attribute embedding from video
nork: Wiles, O, et al.
Argitaratua: (2018)

Self-supervised video object segmentation by motion grouping
nork: Yang, C, et al.
Argitaratua: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
nork: L. Larrabee Strow, et al.
Argitaratua: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
nork: Ponomarchuk S.N., et al.
Argitaratua: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
nork: Junyuan Wang, et al.
Argitaratua: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
nork: Windsor, R, et al.
Argitaratua: (2021)

Now you're speaking my language: visual language identification
nork: Afouras, T, et al.
Argitaratua: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
nork: Zixi Li, et al.
Argitaratua: (2024-09-01)