Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions

Automatic emotion recognition (AER) systems are burgeoning and systems based on either audio, video, text, or physiological signals have emerged. Multimodal systems, in turn, have shown to improve overall AER accuracy and to also provide some robustness against artifacts and missing data. Collecting...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính:	Shruti Kshirsagar, Anurag Pendyala, Tiago H. Falk
Định dạng:	Bài viết
Ngôn ngữ:	English
Được phát hành:	Frontiers Media S.A. 2023-03-01
Loạt:	Frontiers in Computer Science
Những chủ đề:	multimodal emotion recognition BERT based text features modulation spectrum features data augmentation speech enhancement context-awareness
Truy cập trực tuyến:	https://www.frontiersin.org/articles/10.3389/fcomp.2023.1039261/full

Những quyển sách tương tự

Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation
Bằng: Shruti Kshirsagar, et al.
Được phát hành: (2022-08-01)

Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification
Bằng: Sanghyun Lee, et al.
Được phát hành: (2021-01-01)

Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features
Bằng: Dilnoza Mamieva, et al.
Được phát hành: (2023-06-01)

Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
Bằng: Jenq-Haur Wang, et al.
Được phát hành: (2024-10-01)

Addressing Challenges in Hate Speech Detection using BERT-based Models: A Review
Bằng: Jinan Aljawazeri, et al.
Được phát hành: (2024-03-01)

A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition
Bằng: Zhongwen Tu, et al.
Được phát hành: (2023-03-01)

TIMIT-TTS: A Text-to-Speech Dataset for Multimodal Synthetic Media Detection
Bằng: Davide Salvi, et al.
Được phát hành: (2023-01-01)

Data Augmentation and Effective Feature Selection in Generative Adversarial Networks for Speech Emotion Recognition
Bằng: Arash Shilandari, et al.
Được phát hành: (2023-03-01)

Environment-Aware Knowledge Distillation for Improved Resource-Constrained Edge Speech Recognition
Bằng: Arthur Pimentel, et al.
Được phát hành: (2023-11-01)

Multimodal spatio-temporal framework for real-world affect recognition
Bằng: Karishma Raut, et al.
Được phát hành: (2024-01-01)

An Analysis of Context of Culture and Context of Situation in Obama’s Speech Text
Bằng: Samsudin, et al.
Được phát hành: (2020-10-01)

Using BiLSTM Networks for Context-Aware Deep Sensitivity Labelling on Conversational Data
Bằng: Antreas Pogiatzis, et al.
Được phát hành: (2020-12-01)

The Reproducibility of Bio-Acoustic Features is Associated With Sample Duration, Speech Task, and Gender
Bằng: Shaykhah A. Almaghrabi, et al.
Được phát hành: (2022-01-01)

Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
Bằng: Huawei Tao, et al.
Được phát hành: (2022-12-01)

Text Augmentation Using BERT for Image Captioning
Bằng: Viktar Atliha, et al.
Được phát hành: (2020-08-01)

Internet bad information detection based on Bert model
Bằng: Xin CAI
Được phát hành: (2020-11-01)

Internet bad information detection based on Bert model
Bằng: Xin CAI
Được phát hành: (2020-11-01)

An Indoor Location-Based Augmented Reality Framework
Bằng: Jehn-Ruey Jiang, et al.
Được phát hành: (2023-01-01)

Research on feature extraction of unstructured large power texts
Bằng: WANG Jiakai, et al.
Được phát hành: (2024-06-01)

Iranian Speech-language Pathologists’ Awareness of Alternative and Augmentative Communication Methods
Bằng: Talieh Zarifian, et al.
Được phát hành: (2021-03-01)

Design of the Speech Emotion Recognition Model
Bằng: Hanping Ke, et al.
Được phát hành: (2023-07-01)

Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments
Bằng: Hemant Kumar Kathania, et al.
Được phát hành: (2021-09-01)

Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Bằng: Zolzaya Byambadorj, et al.
Được phát hành: (2021-12-01)

Multimodal transformer augmented fusion for speech emotion recognition
Bằng: Yuanyuan Wang, et al.
Được phát hành: (2023-05-01)

A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations
Bằng: Joshua Cohen, et al.
Được phát hành: (2023-09-01)

Automatic Classification of Speech Dysarthric Intelligibility Levels Using Textual Feature
Bằng: Ghadeer F. Alharbi, et al.
Được phát hành: (2025-01-01)

Attention-based speech feature transfer between speakers
Bằng: Hangbok Lee, et al.
Được phát hành: (2024-02-01)

Intelligence Context Aware Mobile Navigation using Augmented Reality Technology
Bằng: Ahmad Hoirul Basori, et al.
Được phát hành: (2018-04-01)

An improved data augmentation approach and its application in medical named entity recognition
Bằng: Hongyu Chen, et al.
Được phát hành: (2024-08-01)

Comprehensive Context Recognizer Based on Multimodal Sensors in a Smartphone
Bằng: Sungyoung Lee, et al.
Được phát hành: (2012-09-01)

Spoof speech detection based on context information and attention feature
Bằng: Jia CHEN, et al.
Được phát hành: (2023-02-01)

Spoof speech detection based on context information and attention feature
Bằng: Jia CHEN, et al.
Được phát hành: (2023-02-01)

Medical Named Entity Recognition Fusing Part-of-Speech and Stroke Features
Bằng: Fen Yi, et al.
Được phát hành: (2023-08-01)

A Hybrid Deep Learning Emotion Classification System Using Multimodal Data
Bằng: Dong-Hwi Kim, et al.
Được phát hành: (2023-11-01)

Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset
Bằng: Mujahid Jamal A. Khalifah, et al.
Được phát hành: (2024-01-01)

Bidirectional Feature Fusion and Enhanced Alignment Based Multimodal Semantic Segmentation for Remote Sensing Images
Bằng: Qianqian Liu, et al.
Được phát hành: (2024-06-01)

Disaster Image Classification by Fusing Multimodal Social Media Data
Bằng: Zhiqiang Zou, et al.
Được phát hành: (2021-09-01)

Lip2Speech: Lightweight Multi-Speaker Speech Reconstruction with Gabor Features
Bằng: Zhongping Dong, et al.
Được phát hành: (2024-01-01)

Large language models and speech genre systematicity
Bằng: Devyatkin, Dmitry Alekseevich, et al.
Được phát hành: (2025-02-01)

Effect on speech emotion classification of a feature selection approach using a convolutional neural network
Bằng: Ammar Amjad, et al.
Được phát hành: (2021-11-01)