AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS

Subject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificia...

Full description

Bibliographic Details
Main Authors: S. S. Astapov, V. I. Kabarov, E. V. Shuranov, A. V. Lavrentyev
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2019-01-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:https://ntv.ifmo.ru/file/article/18676.pdf
_version_ 1818841676854067200
author S. S. Astapov
V. I. Kabarov
E. V. Shuranov
A. V. Lavrentyev
author_facet S. S. Astapov
V. I. Kabarov
E. V. Shuranov
A. V. Lavrentyev
author_sort S. S. Astapov
collection DOAJ
description Subject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificial neural networks and real life recordings performed in reverberant conditions. Main Results. It is shown that the acoustic model is capable of estimating the noise mask on a multichannel mixture for different music genres. The application of such mask to covariance matrix estimation for MVDR (Minimum Variance Distortionless Response) beamforming algorithm results in increasing the recognition accuracy by at least 4.9 % at signal-noise ratio levels of 10–30 dB. Practical Relevance. The method of MVDR coefficient estimation based on noise mask estimation by an acoustic model serves to suppress non-stationary noise, such as music, thus increasing the robustness of automatic speech recognition systems.
first_indexed 2024-12-19T04:29:52Z
format Article
id doaj.art-16643767234d40f683724c7fbbd3f97c
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-12-19T04:29:52Z
publishDate 2019-01-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-16643767234d40f683724c7fbbd3f97c2022-12-21T20:35:54ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732019-01-0119355756010.17586/2226-1494-2019-19-3-557-559AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGSS. S. AstapovV. I. Kabarov E. V. ShuranovA. V. LavrentyevSubject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificial neural networks and real life recordings performed in reverberant conditions. Main Results. It is shown that the acoustic model is capable of estimating the noise mask on a multichannel mixture for different music genres. The application of such mask to covariance matrix estimation for MVDR (Minimum Variance Distortionless Response) beamforming algorithm results in increasing the recognition accuracy by at least 4.9 % at signal-noise ratio levels of 10–30 dB. Practical Relevance. The method of MVDR coefficient estimation based on noise mask estimation by an acoustic model serves to suppress non-stationary noise, such as music, thus increasing the robustness of automatic speech recognition systems.https://ntv.ifmo.ru/file/article/18676.pdfmicrophone arrayMVDRacoustic modelnoise mask estimationmusic noise reductionautomatic speech recognition
spellingShingle S. S. Astapov
V. I. Kabarov
E. V. Shuranov
A. V. Lavrentyev
AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
microphone array
MVDR
acoustic model
noise mask estimation
music noise reduction
automatic speech recognition
title AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
title_full AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
title_fullStr AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
title_full_unstemmed AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
title_short AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
title_sort automatic speech recognition in presence of music noise on multichannel far field recordings
topic microphone array
MVDR
acoustic model
noise mask estimation
music noise reduction
automatic speech recognition
url https://ntv.ifmo.ru/file/article/18676.pdf
work_keys_str_mv AT ssastapov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings
AT vikabarov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings
AT evshuranov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings
AT avlavrentyev automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings