Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
Μορφή:	Conference item
Γλώσσα:	English
Έκδοση:	IEEE 2024

Παρόμοια τεκμήρια

Multi-task self-supervised visual learning
ανά: Doersch, C, κ.ά.
Έκδοση: (2017)

Ambient Sound Provides Supervision for Visual Learning
ανά: Owens, Andrew Hale, κ.ά.
Έκδοση: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
ανά: Owens, Andrew, κ.ά.
Έκδοση: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
ανά: Tomoya Sato, κ.ά.
Έκδοση: (2022-01-01)

Self-supervised learning of audio-visual objects from video
ανά: Afouras, T, κ.ά.
Έκδοση: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
ανά: A. Cebrecos, κ.ά.
Έκδοση: (2014-12-01)

Music Gesture for Visual Sound Separation
ανά: Gan, Chuang, κ.ά.
Έκδοση: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
ανά: M.A. Sergeeva, κ.ά.
Έκδοση: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
ανά: Ponomarchuk S.N., κ.ά.
Έκδοση: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
ανά: Akiyoshi Kurobe, κ.ά.
Έκδοση: (2021-01-01)

Self-supervised learning for spinal MRIs
ανά: Jamaludin, A, κ.ά.
Έκδοση: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
ανά: Gan, Chuang, κ.ά.
Έκδοση: (2021)

Unsupervised discovery of visual object class hierarchies
ανά: Sivic, J, κ.ά.
Έκδοση: (2008)

Localizing visual sounds the hard way
ανά: Vedaldi, A, κ.ά.
Έκδοση: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
ανά: Fergus, R, κ.ά.
Έκδοση: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
ανά: Yansuo Yu, κ.ά.
Έκδοση: (2024-01-01)

Self-supervised co-training for video representation learning
ανά: Han, T, κ.ά.
Έκδοση: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
ανά: Podlesnyi A.V., κ.ά.
Έκδοση: (2019-12-01)

Self-supervised learning of class embeddings from video
ανά: Wiles, O, κ.ά.
Έκδοση: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
ανά: Koepke, S, κ.ά.
Έκδοση: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
ανά: Dewei Kong, κ.ά.
Έκδοση: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
ανά: Sungho Shin, κ.ά.
Έκδοση: (2021-03-01)

Visually Indicated Sounds
ανά: Isola, Phillip, κ.ά.
Έκδοση: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
ανά: Zihui Yang, κ.ά.
Έκδοση: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
ανά: Aurimas Mockevičius, κ.ά.
Έκδοση: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
ανά: Paula Maddigan, κ.ά.
Έκδοση: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
ανά: Nagrani, A, κ.ά.
Έκδοση: (2020)

Self-Supervised Audio-Visual Co-Segmentation
ανά: Rouditchenko, Andrew, κ.ά.
Έκδοση: (2022)

Self-Supervised Audio-Visual Co-Segmentation
ανά: Rouditchenko, Andrew, κ.ά.
Έκδοση: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
ανά: Alexander Bauer, κ.ά.
Έκδοση: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
ανά: Oliver Roesler
Έκδοση: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
ανά: Prajwal, KR, κ.ά.
Έκδοση: (2022)

Self-supervised learning of a facial attribute embedding from video
ανά: Wiles, O, κ.ά.
Έκδοση: (2018)

Self-supervised video object segmentation by motion grouping
ανά: Yang, C, κ.ά.
Έκδοση: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
ανά: L. Larrabee Strow, κ.ά.
Έκδοση: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
ανά: Ponomarchuk S.N., κ.ά.
Έκδοση: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
ανά: Junyuan Wang, κ.ά.
Έκδοση: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
ανά: Windsor, R, κ.ά.
Έκδοση: (2021)

Now you're speaking my language: visual language identification
ανά: Afouras, T, κ.ά.
Έκδοση: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
ανά: Zixi Li, κ.ά.
Έκδοση: (2024-09-01)