Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

Separating the “chirp” from the “chat”: self-supervised visual grounding of sound and language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visual aligned features solely through watching videos. We show that DenseAV can discover the “meaning” of words and the “location” of sounds without explicit localization...

সম্পূর্ণ বিবরণ

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক:	Hamilton, M, Zisserman, A, Hershey, JR, Freeman, WT
বিন্যাস:	Conference item
ভাষা:	English
প্রকাশিত:	IEEE 2024

অনুরূপ উপাদানগুলি

Multi-task self-supervised visual learning
অনুযায়ী: Doersch, C, অন্যান্য
প্রকাশিত: (2017)

Ambient Sound Provides Supervision for Visual Learning
অনুযায়ী: Owens, Andrew Hale, অন্যান্য
প্রকাশিত: (2017)

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
অনুযায়ী: Owens, Andrew, অন্যান্য
প্রকাশিত: (2021)

Self-Supervised Learning for Audio-Visual Relationships of Videos With Stereo Sounds
অনুযায়ী: Tomoya Sato, অন্যান্য
প্রকাশিত: (2022-01-01)

Self-supervised learning of audio-visual objects from video
অনুযায়ী: Afouras, T, অন্যান্য
প্রকাশিত: (2020)

Enhancement of sound by soft reflections in exponentially chirped crystals
অনুযায়ী: A. Cebrecos, অন্যান্য
প্রকাশিত: (2014-12-01)

Music Gesture for Visual Sound Separation
অনুযায়ী: Gan, Chuang, অন্যান্য
প্রকাশিত: (2021)

First observations of oblique ionospheric sounding chirp signal in Mexico
অনুযায়ী: M.A. Sergeeva, অন্যান্য
প্রকাশিত: (2019-03-01)

Features of backscatter ionospheric sounding as studied with a chirp ionosonde
অনুযায়ী: Ponomarchuk S.N., অন্যান্য
প্রকাশিত: (2017-09-01)

Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms
অনুযায়ী: Akiyoshi Kurobe, অন্যান্য
প্রকাশিত: (2021-01-01)

Self-supervised learning for spinal MRIs
অনুযায়ী: Jamaludin, A, অন্যান্য
প্রকাশিত: (2017)

Self-Supervised Moving Vehicle Tracking With Stereo Sound
অনুযায়ী: Gan, Chuang, অন্যান্য
প্রকাশিত: (2021)

Unsupervised discovery of visual object class hierarchies
অনুযায়ী: Sivic, J, অন্যান্য
প্রকাশিত: (2008)

Localizing visual sounds the hard way
অনুযায়ী: Vedaldi, A, অন্যান্য
প্রকাশিত: (2021)

Weakly supervised scale-invariant learning of models for visual recognition
অনুযায়ী: Fergus, R, অন্যান্য
প্রকাশিত: (2006)

Enhancing bowel sound recognition with self-attention and self-supervised pre-training.
অনুযায়ী: Yansuo Yu, অন্যান্য
প্রকাশিত: (2024-01-01)

Self-supervised co-training for video representation learning
অনুযায়ী: Han, T, অন্যান্য
প্রকাশিত: (2020)

ESTIMATING ANTENNA COUPLING FACTOR FOR PROBLEM OF TOPSIDEIONOSPHERE SOUNDING FROM SPACE BY CHIRP SIGNALS
অনুযায়ী: Podlesnyi A.V., অন্যান্য
প্রকাশিত: (2019-12-01)

Self-supervised learning of class embeddings from video
অনুযায়ী: Wiles, O, অন্যান্য
প্রকাশিত: (2020)

Sight to Sound: An End-to-End Approach for Visual Piano Transcription
অনুযায়ী: Koepke, S, অন্যান্য
প্রকাশিত: (2020)

ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
অনুযায়ী: Dewei Kong, অন্যান্য
প্রকাশিত: (2025-01-01)

Self-Supervised Transfer Learning from Natural Images for Sound Classification
অনুযায়ী: Sungho Shin, অন্যান্য
প্রকাশিত: (2021-03-01)

Visually Indicated Sounds
অনুযায়ী: Isola, Phillip, অন্যান্য
প্রকাশিত: (2017)

Direct Underwater Sound Velocity Measurement Based on the Acousto-Optic Self-Interference Effect between the Chirp Signal and the Optical Frequency Comb
অনুযায়ী: Zihui Yang, অন্যান্য
প্রকাশিত: (2022-12-01)

Extraction of Individual EEG Gamma Frequencies from the Responses to Click-Based Chirp-Modulated Sounds
অনুযায়ী: Aurimas Mockevičius, অন্যান্য
প্রকাশিত: (2023-03-01)

Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models
অনুযায়ী: Paula Maddigan, অন্যান্য
প্রকাশিত: (2023-01-01)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
অনুযায়ী: Nagrani, A, অন্যান্য
প্রকাশিত: (2020)

Self-Supervised Audio-Visual Co-Segmentation
অনুযায়ী: Rouditchenko, Andrew, অন্যান্য
প্রকাশিত: (2022)

Self-Supervised Audio-Visual Co-Segmentation
অনুযায়ী: Rouditchenko, Andrew, অন্যান্য
প্রকাশিত: (2021)

Self-Supervised Autoencoders for Visual Anomaly Detection
অনুযায়ী: Alexander Bauer, অন্যান্য
প্রকাশিত: (2024-12-01)

Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
অনুযায়ী: Oliver Roesler
প্রকাশিত: (2022-09-01)

Weakly-supervised fingerspelling recognition in British Sign Language videos
অনুযায়ী: Prajwal, KR, অন্যান্য
প্রকাশিত: (2022)

Self-supervised learning of a facial attribute embedding from video
অনুযায়ী: Wiles, O, অন্যান্য
প্রকাশিত: (2018)

Self-supervised video object segmentation by motion grouping
অনুযায়ী: Yang, C, অন্যান্য
প্রকাশিত: (2021)

A Climate Hyperspectral Infrared Radiance Product (CHIRP) Combining the AIRS and CrIS Satellite Sounding Record
অনুযায়ী: L. Larrabee Strow, অন্যান্য
প্রকাশিত: (2021-01-01)

Diagnostics of HF radio channel: based on data from backscatter ionospheric sounding by continuous chirp signal
অনুযায়ী: Ponomarchuk S.N., অন্যান্য
প্রকাশিত: (2018-06-01)

Application of Optimized Adaptive Chirp Mode Decomposition Method in Chirp Signal
অনুযায়ী: Junyuan Wang, অন্যান্য
প্রকাশিত: (2020-05-01)

Self-supervised multi-modal alignment for whole body medical imaging
অনুযায়ী: Windsor, R, অন্যান্য
প্রকাশিত: (2021)

Now you're speaking my language: visual language identification
অনুযায়ী: Afouras, T, অন্যান্য
প্রকাশিত: (2020)

Exploring the Utility of ChatGPT for Self-directed Online Language Learning
অনুযায়ী: Zixi Li, অন্যান্য
প্রকাশিত: (2024-09-01)