Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian audito...

Full description

Bibliographic Details
Main Authors: Ifat Yasin, Vit Drga, Fangqi Liu, Andreas Demosthenous, Ray Meddis
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9044371/
_version_ 1819173671301808128
author Ifat Yasin
Vit Drga
Fangqi Liu
Andreas Demosthenous
Ray Meddis
author_facet Ifat Yasin
Vit Drga
Fangqi Liu
Andreas Demosthenous
Ray Meddis
author_sort Ifat Yasin
collection DOAJ
description Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.
first_indexed 2024-12-22T20:26:47Z
format Article
id doaj.art-5e438c5f922a407298a46c5fa386f4ed
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T20:26:47Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-5e438c5f922a407298a46c5fa386f4ed2022-12-21T18:13:43ZengIEEEIEEE Access2169-35362020-01-018567115671910.1109/ACCESS.2020.29818859044371Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time ConstantsIfat Yasin0https://orcid.org/0000-0002-7055-6448Vit Drga1Fangqi Liu2Andreas Demosthenous3https://orcid.org/0000-0003-0623-963XRay Meddis4Department of Computer Science, University College London, London, U.KDepartment of Computer Science, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Psychology, University of Essex, Colchester, U.KPhysiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.https://ieeexplore.ieee.org/document/9044371/AuditoryhearingefferentMedial OlivoCochlear (MOC)speech recognitionauditory model
spellingShingle Ifat Yasin
Vit Drga
Fangqi Liu
Andreas Demosthenous
Ray Meddis
Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
IEEE Access
Auditory
hearing
efferent
Medial OlivoCochlear (MOC)
speech recognition
auditory model
title Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_full Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_fullStr Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_full_unstemmed Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_short Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_sort optimizing speech recognition using a computational model of human hearing effect of noise type and efferent time constants
topic Auditory
hearing
efferent
Medial OlivoCochlear (MOC)
speech recognition
auditory model
url https://ieeexplore.ieee.org/document/9044371/
work_keys_str_mv AT ifatyasin optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants
AT vitdrga optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants
AT fangqiliu optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants
AT andreasdemosthenous optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants
AT raymeddis optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants