Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian audito...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9044371/ |
_version_ | 1819173671301808128 |
---|---|
author | Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis |
author_facet | Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis |
author_sort | Ifat Yasin |
collection | DOAJ |
description | Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss. |
first_indexed | 2024-12-22T20:26:47Z |
format | Article |
id | doaj.art-5e438c5f922a407298a46c5fa386f4ed |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T20:26:47Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-5e438c5f922a407298a46c5fa386f4ed2022-12-21T18:13:43ZengIEEEIEEE Access2169-35362020-01-018567115671910.1109/ACCESS.2020.29818859044371Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time ConstantsIfat Yasin0https://orcid.org/0000-0002-7055-6448Vit Drga1Fangqi Liu2Andreas Demosthenous3https://orcid.org/0000-0003-0623-963XRay Meddis4Department of Computer Science, University College London, London, U.KDepartment of Computer Science, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Psychology, University of Essex, Colchester, U.KPhysiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.https://ieeexplore.ieee.org/document/9044371/AuditoryhearingefferentMedial OlivoCochlear (MOC)speech recognitionauditory model |
spellingShingle | Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants IEEE Access Auditory hearing efferent Medial OlivoCochlear (MOC) speech recognition auditory model |
title | Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants |
title_full | Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants |
title_fullStr | Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants |
title_full_unstemmed | Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants |
title_short | Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants |
title_sort | optimizing speech recognition using a computational model of human hearing effect of noise type and efferent time constants |
topic | Auditory hearing efferent Medial OlivoCochlear (MOC) speech recognition auditory model |
url | https://ieeexplore.ieee.org/document/9044371/ |
work_keys_str_mv | AT ifatyasin optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT vitdrga optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT fangqiliu optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT andreasdemosthenous optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT raymeddis optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants |