AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
Subject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificia...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
2019-01-01
|
Series: | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
Subjects: | |
Online Access: | https://ntv.ifmo.ru/file/article/18676.pdf |
_version_ | 1818841676854067200 |
---|---|
author | S. S. Astapov V. I. Kabarov E. V. Shuranov A. V. Lavrentyev |
author_facet | S. S. Astapov V. I. Kabarov E. V. Shuranov A. V. Lavrentyev |
author_sort | S. S. Astapov |
collection | DOAJ |
description | Subject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificial neural networks and real life recordings performed in reverberant conditions. Main Results. It is shown that the acoustic model is capable of estimating the noise mask on a multichannel mixture for different music genres. The application of such mask to covariance matrix estimation for MVDR (Minimum Variance Distortionless Response) beamforming algorithm results in increasing the recognition accuracy by at least 4.9 % at signal-noise ratio levels of 10–30 dB. Practical Relevance. The method of MVDR coefficient estimation based on noise mask estimation by an acoustic model serves to suppress non-stationary noise, such as music, thus increasing the robustness of automatic speech recognition systems. |
first_indexed | 2024-12-19T04:29:52Z |
format | Article |
id | doaj.art-16643767234d40f683724c7fbbd3f97c |
institution | Directory Open Access Journal |
issn | 2226-1494 2500-0373 |
language | English |
last_indexed | 2024-12-19T04:29:52Z |
publishDate | 2019-01-01 |
publisher | Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) |
record_format | Article |
series | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
spelling | doaj.art-16643767234d40f683724c7fbbd3f97c2022-12-21T20:35:54ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732019-01-0119355756010.17586/2226-1494-2019-19-3-557-559AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGSS. S. AstapovV. I. Kabarov E. V. ShuranovA. V. LavrentyevSubject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificial neural networks and real life recordings performed in reverberant conditions. Main Results. It is shown that the acoustic model is capable of estimating the noise mask on a multichannel mixture for different music genres. The application of such mask to covariance matrix estimation for MVDR (Minimum Variance Distortionless Response) beamforming algorithm results in increasing the recognition accuracy by at least 4.9 % at signal-noise ratio levels of 10–30 dB. Practical Relevance. The method of MVDR coefficient estimation based on noise mask estimation by an acoustic model serves to suppress non-stationary noise, such as music, thus increasing the robustness of automatic speech recognition systems.https://ntv.ifmo.ru/file/article/18676.pdfmicrophone arrayMVDRacoustic modelnoise mask estimationmusic noise reductionautomatic speech recognition |
spellingShingle | S. S. Astapov V. I. Kabarov E. V. Shuranov A. V. Lavrentyev AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki microphone array MVDR acoustic model noise mask estimation music noise reduction automatic speech recognition |
title | AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS |
title_full | AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS |
title_fullStr | AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS |
title_full_unstemmed | AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS |
title_short | AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS |
title_sort | automatic speech recognition in presence of music noise on multichannel far field recordings |
topic | microphone array MVDR acoustic model noise mask estimation music noise reduction automatic speech recognition |
url | https://ntv.ifmo.ru/file/article/18676.pdf |
work_keys_str_mv | AT ssastapov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings AT vikabarov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings AT evshuranov automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings AT avlavrentyev automaticspeechrecognitioninpresenceofmusicnoiseonmultichannelfarfieldrecordings |