Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
Формат:	Conference item
Хэл сонгох:	English
Хэвлэсэн:	IEEE 2024

Ижил төстэй зүйлс

Multi-task self-supervised visual learning
-н: Doersch, C, зэрэг
Хэвлэсэн: (2017)

Ambient Sound Provides Supervision for Visual Learning
-н: Owens, Andrew Hale, зэрэг
Хэвлэсэн: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
-н: Owens, Andrew, зэрэг
Хэвлэсэн: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
-н: Tomoya Sato, зэрэг
Хэвлэсэн: (2022-01-01)

Self-supervised learning of audio-visual objects from video
-н: Afouras, T, зэрэг
Хэвлэсэн: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
-н: A. Cebrecos, зэрэг
Хэвлэсэн: (2014-12-01)

Music Gesture for Visual Sound Separation
-н: Gan, Chuang, зэрэг
Хэвлэсэн: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
-н: M.A. Sergeeva, зэрэг
Хэвлэсэн: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
-н: Ponomarchuk S.N., зэрэг
Хэвлэсэн: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
-н: Akiyoshi Kurobe, зэрэг
Хэвлэсэн: (2021-01-01)

Self-supervised learning for spinal MRIs
-н: Jamaludin, A, зэрэг
Хэвлэсэн: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
-н: Gan, Chuang, зэрэг
Хэвлэсэн: (2021)

Unsupervised discovery of visual object class hierarchies
-н: Sivic, J, зэрэг
Хэвлэсэн: (2008)

Localizing visual sounds the hard way
-н: Vedaldi, A, зэрэг
Хэвлэсэн: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
-н: Fergus, R, зэрэг
Хэвлэсэн: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
-н: Yansuo Yu, зэрэг
Хэвлэсэн: (2024-01-01)

Self-supervised co-training for video representation learning
-н: Han, T, зэрэг
Хэвлэсэн: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
-н: Podlesnyi A.V., зэрэг
Хэвлэсэн: (2019-12-01)

Self-supervised learning of class embeddings from video
-н: Wiles, O, зэрэг
Хэвлэсэн: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
-н: Koepke, S, зэрэг
Хэвлэсэн: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
-н: Dewei Kong, зэрэг
Хэвлэсэн: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
-н: Sungho Shin, зэрэг
Хэвлэсэн: (2021-03-01)

Visually Indicated Sounds
-н: Isola, Phillip, зэрэг
Хэвлэсэн: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
-н: Zihui Yang, зэрэг
Хэвлэсэн: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
-н: Aurimas Mockevičius, зэрэг
Хэвлэсэн: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
-н: Paula Maddigan, зэрэг
Хэвлэсэн: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
-н: Nagrani, A, зэрэг
Хэвлэсэн: (2020)

Self-Supervised Audio-Visual Co-Segmentation
-н: Rouditchenko, Andrew, зэрэг
Хэвлэсэн: (2022)

Self-Supervised Audio-Visual Co-Segmentation
-н: Rouditchenko, Andrew, зэрэг
Хэвлэсэн: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
-н: Alexander Bauer, зэрэг
Хэвлэсэн: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
-н: Oliver Roesler
Хэвлэсэн: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
-н: Prajwal, KR, зэрэг
Хэвлэсэн: (2022)

Self-supervised learning of a facial attribute embedding from video
-н: Wiles, O, зэрэг
Хэвлэсэн: (2018)

Self-supervised video object segmentation by motion grouping
-н: Yang, C, зэрэг
Хэвлэсэн: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
-н: L. Larrabee Strow, зэрэг
Хэвлэсэн: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
-н: Ponomarchuk S.N., зэрэг
Хэвлэсэн: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
-н: Junyuan Wang, зэрэг
Хэвлэсэн: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
-н: Windsor, R, зэрэг
Хэвлэсэн: (2021)

Now you're speaking my language: visual language identification
-н: Afouras, T, зэрэг
Хэвлэсэн: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
-н: Zixi Li, зэрэг
Хэвлэсэн: (2024-09-01)