Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

Ամբողջական նկարագրություն

Մատենագիտական մանրամասներ
Հիմնական հեղինակներ:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
Ձևաչափ:	Conference item
Լեզու:	English
Հրապարակվել է:	IEEE 2024

Նմանատիպ նյութեր

Multi-task self-supervised visual learning
‌: Doersch, C, և այլն
Հրապարակվել է: (2017)

Ambient Sound Provides Supervision for Visual Learning
‌: Owens, Andrew Hale, և այլն
Հրապարակվել է: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
‌: Owens, Andrew, և այլն
Հրապարակվել է: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
‌: Tomoya Sato, և այլն
Հրապարակվել է: (2022-01-01)

Self-supervised learning of audio-visual objects from video
‌: Afouras, T, և այլն
Հրապարակվել է: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
‌: A. Cebrecos, և այլն
Հրապարակվել է: (2014-12-01)

Music Gesture for Visual Sound Separation
‌: Gan, Chuang, և այլն
Հրապարակվել է: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
‌: M.A. Sergeeva, և այլն
Հրապարակվել է: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
‌: Ponomarchuk S.N., և այլն
Հրապարակվել է: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
‌: Akiyoshi Kurobe, և այլն
Հրապարակվել է: (2021-01-01)

Self-supervised learning for spinal MRIs
‌: Jamaludin, A, և այլն
Հրապարակվել է: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
‌: Gan, Chuang, և այլն
Հրապարակվել է: (2021)

Unsupervised discovery of visual object class hierarchies
‌: Sivic, J, և այլն
Հրապարակվել է: (2008)

Localizing visual sounds the hard way
‌: Vedaldi, A, և այլն
Հրապարակվել է: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
‌: Fergus, R, և այլն
Հրապարակվել է: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
‌: Yansuo Yu, և այլն
Հրապարակվել է: (2024-01-01)

Self-supervised co-training for video representation learning
‌: Han, T, և այլն
Հրապարակվել է: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
‌: Podlesnyi A.V., և այլն
Հրապարակվել է: (2019-12-01)

Self-supervised learning of class embeddings from video
‌: Wiles, O, և այլն
Հրապարակվել է: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
‌: Koepke, S, և այլն
Հրապարակվել է: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
‌: Dewei Kong, և այլն
Հրապարակվել է: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
‌: Sungho Shin, և այլն
Հրապարակվել է: (2021-03-01)

Visually Indicated Sounds
‌: Isola, Phillip, և այլն
Հրապարակվել է: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
‌: Zihui Yang, և այլն
Հրապարակվել է: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
‌: Aurimas Mockevičius, և այլն
Հրապարակվել է: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
‌: Paula Maddigan, և այլն
Հրապարակվել է: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
‌: Nagrani, A, և այլն
Հրապարակվել է: (2020)

Self-Supervised Audio-Visual Co-Segmentation
‌: Rouditchenko, Andrew, և այլն
Հրապարակվել է: (2022)

Self-Supervised Audio-Visual Co-Segmentation
‌: Rouditchenko, Andrew, և այլն
Հրապարակվել է: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
‌: Alexander Bauer, և այլն
Հրապարակվել է: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
‌: Oliver Roesler
Հրապարակվել է: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
‌: Prajwal, KR, և այլն
Հրապարակվել է: (2022)

Self-supervised learning of a facial attribute embedding from video
‌: Wiles, O, և այլն
Հրապարակվել է: (2018)

Self-supervised video object segmentation by motion grouping
‌: Yang, C, և այլն
Հրապարակվել է: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
‌: L. Larrabee Strow, և այլն
Հրապարակվել է: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
‌: Ponomarchuk S.N., և այլն
Հրապարակվել է: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
‌: Junyuan Wang, և այլն
Հրապարակվել է: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
‌: Windsor, R, և այլն
Հրապարակվել է: (2021)

Now you're speaking my language: visual language identification
‌: Afouras, T, և այլն
Հրապարակվել է: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
‌: Zixi Li, և այլն
Հրապարակվել է: (2024-09-01)