Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Automatic emotion recognition (AER) systems are burgeoning and systems based on either audio, video, text, or physiological signals have emerged. Multimodal systems, in turn, have shown to improve overall AER accuracy and to also provide some robustness against artifacts and missing data. Collecting...

Descripció completa

Dades bibliogràfiques
Autors principals:	Shruti Kshirsagar, Anurag Pendyala, Tiago H. Falk
Format:	Article
Idioma:	English
Publicat:	Frontiers Media S.A. 2023-03-01
Col·lecció:	Frontiers in Computer Science
Matèries:	multimodal emotion recognition BERT based text features modulation spectrum features data augmentation speech enhancement context-awareness
Accés en línia:	https://www.frontiersin.org/articles/10.3389/fcomp.2023.1039261/full

Ítems similars

Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation
per: Shruti Kshirsagar, et al.
Publicat: (2022-08-01)

Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification
per: Sanghyun Lee, et al.
Publicat: (2021-01-01)

Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features
per: Dilnoza Mamieva, et al.
Publicat: (2023-06-01)

Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
per: Jenq-Haur Wang, et al.
Publicat: (2024-10-01)

Addressing Challenges in Hate Speech Detection using BERT-based Models: A Review
per: Jinan Aljawazeri, et al.
Publicat: (2024-03-01)

A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition
per: Zhongwen Tu, et al.
Publicat: (2023-03-01)

TIMIT-TTS: A Text-to-Speech Dataset for Multimodal Synthetic Media Detection
per: Davide Salvi, et al.
Publicat: (2023-01-01)

Data Augmentation and Effective Feature Selection in Generative Adversarial Networks for Speech Emotion Recognition
per: Arash Shilandari, et al.
Publicat: (2023-03-01)

Environment-Aware Knowledge Distillation for Improved Resource-Constrained Edge Speech Recognition
per: Arthur Pimentel, et al.
Publicat: (2023-11-01)

Multimodal spatio-temporal framework for real-world affect recognition
per: Karishma Raut, et al.
Publicat: (2024-01-01)

An Analysis of Context of Culture and Context of Situation in Obama’s Speech Text
per: Samsudin, et al.
Publicat: (2020-10-01)

Using BiLSTM Networks for Context-Aware Deep Sensitivity Labelling on Conversational Data
per: Antreas Pogiatzis, et al.
Publicat: (2020-12-01)

The Reproducibility of Bio-Acoustic Features is Associated With Sample Duration, Speech Task, and Gender
per: Shaykhah A. Almaghrabi, et al.
Publicat: (2022-01-01)

Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
per: Huawei Tao, et al.
Publicat: (2022-12-01)

Text Augmentation Using BERT for Image Captioning
per: Viktar Atliha, et al.
Publicat: (2020-08-01)

Internet bad information detection based on Bert model
per: Xin CAI
Publicat: (2020-11-01)

Internet bad information detection based on Bert model
per: Xin CAI
Publicat: (2020-11-01)

An Indoor Location-Based Augmented Reality Framework
per: Jehn-Ruey Jiang, et al.
Publicat: (2023-01-01)

Research on feature extraction of unstructured large power texts
per: WANG Jiakai, et al.
Publicat: (2024-06-01)

Iranian Speech-language Pathologists’ Awareness of Alternative and Augmentative Communication Methods
per: Talieh Zarifian, et al.
Publicat: (2021-03-01)

Design of the Speech Emotion Recognition Model
per: Hanping Ke, et al.
Publicat: (2023-07-01)

Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments
per: Hemant Kumar Kathania, et al.
Publicat: (2021-09-01)

Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
per: Zolzaya Byambadorj, et al.
Publicat: (2021-12-01)

Multimodal transformer augmented fusion for speech emotion recognition
per: Yuanyuan Wang, et al.
Publicat: (2023-05-01)

A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations
per: Joshua Cohen, et al.
Publicat: (2023-09-01)

Automatic Classification of Speech Dysarthric Intelligibility Levels Using Textual Feature
per: Ghadeer F. Alharbi, et al.
Publicat: (2025-01-01)

Attention-based speech feature transfer between speakers
per: Hangbok Lee, et al.
Publicat: (2024-02-01)

Intelligence Context Aware Mobile Navigation using Augmented Reality Technology
per: Ahmad Hoirul Basori, et al.
Publicat: (2018-04-01)

An improved data augmentation approach and its application in medical named entity recognition
per: Hongyu Chen, et al.
Publicat: (2024-08-01)

Comprehensive Context Recognizer Based on Multimodal Sensors in a Smartphone
per: Sungyoung Lee, et al.
Publicat: (2012-09-01)

Spoof speech detection based on context information and attention feature
per: Jia CHEN, et al.
Publicat: (2023-02-01)

Spoof speech detection based on context information and attention feature
per: Jia CHEN, et al.
Publicat: (2023-02-01)

Medical Named Entity Recognition Fusing Part-of-Speech and Stroke Features
per: Fen Yi, et al.
Publicat: (2023-08-01)

A Hybrid Deep Learning Emotion Classification System Using Multimodal Data
per: Dong-Hwi Kim, et al.
Publicat: (2023-11-01)

Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset
per: Mujahid Jamal A. Khalifah, et al.
Publicat: (2024-01-01)

Bidirectional Feature Fusion and Enhanced Alignment Based Multimodal Semantic Segmentation for Remote Sensing Images
per: Qianqian Liu, et al.
Publicat: (2024-06-01)

Disaster Image Classification by Fusing Multimodal Social Media Data
per: Zhiqiang Zou, et al.
Publicat: (2021-09-01)

Lip2Speech: Lightweight Multi-Speaker Speech Reconstruction with Gabor Features
per: Zhongping Dong, et al.
Publicat: (2024-01-01)

Large language models and speech genre systematicity
per: Devyatkin, Dmitry Alekseevich, et al.
Publicat: (2025-02-01)

Effect on speech emotion classification of a feature selection approach using a convolutional neural network
per: Ammar Amjad, et al.
Publicat: (2021-11-01)