Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian audito...

Full description

Bibliographic Details
Main Authors:	Ifat Yasin, Vit Drga, Fangqi Liu, Andreas Demosthenous, Ray Meddis
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Auditory hearing efferent Medial OlivoCochlear (MOC) speech recognition auditory model
Online Access:	https://ieeexplore.ieee.org/document/9044371/

_version_	1819173671301808128
author	Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis
author_facet	Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis
author_sort	Ifat Yasin
collection	DOAJ
description	Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.
first_indexed	2024-12-22T20:26:47Z
format	Article
id	doaj.art-5e438c5f922a407298a46c5fa386f4ed
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-22T20:26:47Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-5e438c5f922a407298a46c5fa386f4ed2022-12-21T18:13:43ZengIEEEIEEE Access2169-35362020-01-018567115671910.1109/ACCESS.2020.29818859044371Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time ConstantsIfat Yasin0https://orcid.org/0000-0002-7055-6448Vit Drga1Fangqi Liu2Andreas Demosthenous3https://orcid.org/0000-0003-0623-963XRay Meddis4Department of Computer Science, University College London, London, U.KDepartment of Computer Science, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Electronic and Electrical Engineering, University College London, London, U.KDepartment of Psychology, University of Essex, Colchester, U.KPhysiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.https://ieeexplore.ieee.org/document/9044371/AuditoryhearingefferentMedial OlivoCochlear (MOC)speech recognitionauditory model
spellingShingle	Ifat Yasin Vit Drga Fangqi Liu Andreas Demosthenous Ray Meddis Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants IEEE Access Auditory hearing efferent Medial OlivoCochlear (MOC) speech recognition auditory model
title	Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_full	Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_fullStr	Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_full_unstemmed	Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_short	Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants
title_sort	optimizing speech recognition using a computational model of human hearing effect of noise type and efferent time constants
topic	Auditory hearing efferent Medial OlivoCochlear (MOC) speech recognition auditory model
url	https://ieeexplore.ieee.org/document/9044371/
work_keys_str_mv	AT ifatyasin optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT vitdrga optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT fangqiliu optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT andreasdemosthenous optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants AT raymeddis optimizingspeechrecognitionusingacomputationalmodelofhumanhearingeffectofnoisetypeandefferenttimeconstants

Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Similar Items