Emotion recognition in speech using cross-modal transfer in the wild

Emotion recognition in speech using cross-modal transfer in the wild

Show other versions (1)

Obtaining large, human labelled speech datasets to train models for emotion recognition is a notoriously challenging task, hindered by annotation cost and label ambiguity. In this work, we consider the task of learning embeddings for speech classification without access to any form of labelled audio...

Full description

Bibliographic Details
Main Authors:	Albanie, S, Nagrani, A, Vedaldi, A, Zisserman, A
Format:	Internet publication
Language:	English
Published:	arXiv 2018

Similar Items

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
by: Albanie, S, et al.
Published: (2018)

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
by: Nagrani, A, et al.
Published: (2020)

Learnable PINs: Cross-modal embeddings for person identity
by: Nagrani, A, et al.
Published: (2018)

Seeing voices and hearing faces: Cross-modal biometric matching
by: Nagrani, A, et al.
Published: (2018)

Speech2Action: Cross-modal supervision for action recognition
by: Nagrani, A, et al.
Published: (2020)

Utterance-level aggregation for speaker recognition in the wild
by: Xie, W, et al.
Published: (2019)

Audio-visual synchronisation in the wild
by: Chen, H, et al.
Published: (2021)

Count, crop and recognise: fine-grained recognition in the wild
by: Bain, M, et al.
Published: (2020)

Use what you have: Video retrieval using representations from collaborative experts
by: Liu, Y, et al.
Published: (2020)

Chimpanzee face recognition from videos in the wild using deep learning
by: Schofield, D, et al.
Published: (2019)

Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
by: Lijiang Chen, et al.
Published: (2022-04-01)

Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
by: Soyeon Hong, et al.
Published: (2024-01-01)

Multi-modal Correlated Network for emotion recognition in speech
by: Minjie Ren, et al.
Published: (2019-09-01)

Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
by: Sung-Woo Byun, et al.
Published: (2021-08-01)

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
by: Cheng Lu, et al.
Published: (2022-07-01)

Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
by: Na Liu, et al.
Published: (2021-01-01)

TEACHTEXT: CrossModal generalized distillation for text-video retrieval
by: Croitoru, I, et al.
Published: (2022)

VoxCeleb2: Deep speaker recognition
by: Chung, J, et al.
Published: (2018)

Voxceleb: large-scale speaker verification in the wild
by: Nagrani, A, et al.
Published: (2019)

TeachText: CrossModal text-video retrieval through generalized distillation
by: Croitoru, I, et al.
Published: (2024)

Spot the conversation: Speaker diarisation in the wild
by: Chung, JS, et al.
Published: (2020)

Self-attention transfer networks for speech emotion recognition
by: Ziping Zhao, et al.
Published: (2021-02-01)

GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
by: Feng Li, et al.
Published: (2023-05-01)

Speech Emotion Recognition Based on Transfer Emotion-Discriminative Features Subspace Learning
by: Zhang Kexin, et al.
Published: (2023-01-01)

Fisher vector faces in the wild
by: Simonyan, K, et al.
Published: (2013)

Deep face recognition
by: Parkhi, O, et al.
Published: (2015)

Slow-fast auditory streams for audio recognition
by: Kazakos, E, et al.
Published: (2021)

Reading to listen at the cocktail party: multi-modal speech separation
by: Rahimi, A, et al.
Published: (2022)

Learning grimaces by watching TV
by: Albanie, S, et al.
Published: (2016)

Speech Emotion Recognition Using Deep Learning Transfer Models and Explainable Techniques
by: Tae-Wan Kim, et al.
Published: (2024-02-01)

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
by: Dong Liu, et al.
Published: (2021-07-01)

Emotion recognition based on multi-modal physiological signals and transfer learning
by: Zhongzheng Fu, et al.
Published: (2022-09-01)

Reading text in the wild with convolutional neural networks
by: Jaderberg, M, et al.
Published: (2014)

Reading text in the wild with convolutional neural networks
by: Jaderberg, M, et al.
Published: (2015)

EPIC-fusion: audio-visual temporal binding for egocentric action recognition
by: Kazakos, E, et al.
Published: (2020)

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
by: Maros Jakubec, et al.
Published: (2024-10-01)

Learning to read by spelling: towards unsupervised text recognition
by: Gupta, A, et al.
Published: (2018)

End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild
by: Denis Dresvyanskiy, et al.
Published: (2022-01-01)

Learning to read by spelling: towards unsupervised text recognition
by: Gupta, A, et al.
Published: (2020)

Preprocessing signal for Speech Emotion Recognition
by: Bashar M. Nema, et al.
Published: (2018-07-01)