Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

תיאור מלא

מידע ביבליוגרפי
Main Authors:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
פורמט:	Conference item
שפה:	English
יצא לאור:	IEEE 2024

פריטים דומים

Multi-task self-supervised visual learning
מאת: Doersch, C, et al.
יצא לאור: (2017)

Ambient Sound Provides Supervision for Visual Learning
מאת: Owens, Andrew Hale, et al.
יצא לאור: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
מאת: Owens, Andrew, et al.
יצא לאור: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
מאת: Tomoya Sato, et al.
יצא לאור: (2022-01-01)

Self-supervised learning of audio-visual objects from video
מאת: Afouras, T, et al.
יצא לאור: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
מאת: A. Cebrecos, et al.
יצא לאור: (2014-12-01)

Music Gesture for Visual Sound Separation
מאת: Gan, Chuang, et al.
יצא לאור: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
מאת: M.A. Sergeeva, et al.
יצא לאור: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
מאת: Ponomarchuk S.N., et al.
יצא לאור: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
מאת: Akiyoshi Kurobe, et al.
יצא לאור: (2021-01-01)

Self-supervised learning for spinal MRIs
מאת: Jamaludin, A, et al.
יצא לאור: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
מאת: Gan, Chuang, et al.
יצא לאור: (2021)

Unsupervised discovery of visual object class hierarchies
מאת: Sivic, J, et al.
יצא לאור: (2008)

Localizing visual sounds the hard way
מאת: Vedaldi, A, et al.
יצא לאור: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
מאת: Fergus, R, et al.
יצא לאור: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
מאת: Yansuo Yu, et al.
יצא לאור: (2024-01-01)

Self-supervised co-training for video representation learning
מאת: Han, T, et al.
יצא לאור: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
מאת: Podlesnyi A.V., et al.
יצא לאור: (2019-12-01)

Self-supervised learning of class embeddings from video
מאת: Wiles, O, et al.
יצא לאור: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
מאת: Koepke, S, et al.
יצא לאור: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
מאת: Dewei Kong, et al.
יצא לאור: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
מאת: Sungho Shin, et al.
יצא לאור: (2021-03-01)

Visually Indicated Sounds
מאת: Isola, Phillip, et al.
יצא לאור: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
מאת: Zihui Yang, et al.
יצא לאור: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
מאת: Aurimas Mockevičius, et al.
יצא לאור: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
מאת: Paula Maddigan, et al.
יצא לאור: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
מאת: Nagrani, A, et al.
יצא לאור: (2020)

Self-Supervised Audio-Visual Co-Segmentation
מאת: Rouditchenko, Andrew, et al.
יצא לאור: (2022)

Self-Supervised Audio-Visual Co-Segmentation
מאת: Rouditchenko, Andrew, et al.
יצא לאור: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
מאת: Alexander Bauer, et al.
יצא לאור: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
מאת: Oliver Roesler
יצא לאור: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
מאת: Prajwal, KR, et al.
יצא לאור: (2022)

Self-supervised learning of a facial attribute embedding from video
מאת: Wiles, O, et al.
יצא לאור: (2018)

Self-supervised video object segmentation by motion grouping
מאת: Yang, C, et al.
יצא לאור: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
מאת: L. Larrabee Strow, et al.
יצא לאור: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
מאת: Ponomarchuk S.N., et al.
יצא לאור: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
מאת: Junyuan Wang, et al.
יצא לאור: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
מאת: Windsor, R, et al.
יצא לאור: (2021)

Now you're speaking my language: visual language identification
מאת: Afouras, T, et al.
יצא לאור: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
מאת: Zixi Li, et al.
יצא לאור: (2024-09-01)