Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses

Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precis...

Full description

Bibliographic Details
Main Authors: Youngmin Na, Hyosung Joo, Le Thi Trang, Luong Do Anh Quan, Jihwan Woo
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-08-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/full
_version_ 1811214583748100096
author Youngmin Na
Hyosung Joo
Le Thi Trang
Luong Do Anh Quan
Jihwan Woo
Jihwan Woo
author_facet Youngmin Na
Hyosung Joo
Le Thi Trang
Luong Do Anh Quan
Jihwan Woo
Jihwan Woo
author_sort Youngmin Na
collection DOAJ
description Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.
first_indexed 2024-04-12T06:07:10Z
format Article
id doaj.art-4c3e03792b1a43f781875a6f54b3595d
institution Directory Open Access Journal
issn 1662-453X
language English
last_indexed 2024-04-12T06:07:10Z
publishDate 2022-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroscience
spelling doaj.art-4c3e03792b1a43f781875a6f54b3595d2022-12-22T03:44:50ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2022-08-011610.3389/fnins.2022.906616906616Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responsesYoungmin Na0Hyosung Joo1Le Thi Trang2Luong Do Anh Quan3Jihwan Woo4Jihwan Woo5Department of Biomedical Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Biomedical Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaAuditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/fullspeech intelligibilitydeep-learningcontinuous speechocclusion sensitivityEEG
spellingShingle Youngmin Na
Hyosung Joo
Le Thi Trang
Luong Do Anh Quan
Jihwan Woo
Jihwan Woo
Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
Frontiers in Neuroscience
speech intelligibility
deep-learning
continuous speech
occlusion sensitivity
EEG
title Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
title_full Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
title_fullStr Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
title_full_unstemmed Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
title_short Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
title_sort objective speech intelligibility prediction using a deep learning model with continuous speech evoked cortical auditory responses
topic speech intelligibility
deep-learning
continuous speech
occlusion sensitivity
EEG
url https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/full
work_keys_str_mv AT youngminna objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses
AT hyosungjoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses
AT lethitrang objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses
AT luongdoanhquan objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses
AT jihwanwoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses
AT jihwanwoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses