Speech Enhancement Performance Based on the MANNER Network Using Feature Fusion

The problems that the multi-view attention network for noise erasure (MANNER) cannot take into account are the local and global features in the speech enhancement of long sequences. An attention and feature fusion MANNER (AF-MANNER) network is proposed, which improves the multi-view attention (MA) m...

Full description

Bibliographic Details
Main Authors: Shijie Wang, Ji Li, Lei Shao, Hongli Liu, Lihua Zhu, Xiaochen Zhu
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/8/1768
Description
Summary:The problems that the multi-view attention network for noise erasure (MANNER) cannot take into account are the local and global features in the speech enhancement of long sequences. An attention and feature fusion MANNER (AF-MANNER) network is proposed, which improves the multi-view attention (MA) module in MANNER and replaces the global and local attention in the module. AF-MANNER also designs the feature-weighted fusion module to fuse the features of flash attention and neighborhood attention to enhance the feature expression of the network. The final ablation studies show that this network exhibits a good performance in speech enhancement and that its structure is valuable for improving the intelligibility and perceptual quality of speech.
ISSN:2079-9292