Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precis...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-08-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/full |
_version_ | 1811214583748100096 |
---|---|
author | Youngmin Na Hyosung Joo Le Thi Trang Luong Do Anh Quan Jihwan Woo Jihwan Woo |
author_facet | Youngmin Na Hyosung Joo Le Thi Trang Luong Do Anh Quan Jihwan Woo Jihwan Woo |
author_sort | Youngmin Na |
collection | DOAJ |
description | Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test. |
first_indexed | 2024-04-12T06:07:10Z |
format | Article |
id | doaj.art-4c3e03792b1a43f781875a6f54b3595d |
institution | Directory Open Access Journal |
issn | 1662-453X |
language | English |
last_indexed | 2024-04-12T06:07:10Z |
publishDate | 2022-08-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj.art-4c3e03792b1a43f781875a6f54b3595d2022-12-22T03:44:50ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2022-08-011610.3389/fnins.2022.906616906616Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responsesYoungmin Na0Hyosung Joo1Le Thi Trang2Luong Do Anh Quan3Jihwan Woo4Jihwan Woo5Department of Biomedical Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Biomedical Engineering, University of Ulsan, Ulsan, South KoreaDepartment of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South KoreaAuditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/fullspeech intelligibilitydeep-learningcontinuous speechocclusion sensitivityEEG |
spellingShingle | Youngmin Na Hyosung Joo Le Thi Trang Luong Do Anh Quan Jihwan Woo Jihwan Woo Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses Frontiers in Neuroscience speech intelligibility deep-learning continuous speech occlusion sensitivity EEG |
title | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_full | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_fullStr | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_full_unstemmed | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_short | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_sort | objective speech intelligibility prediction using a deep learning model with continuous speech evoked cortical auditory responses |
topic | speech intelligibility deep-learning continuous speech occlusion sensitivity EEG |
url | https://www.frontiersin.org/articles/10.3389/fnins.2022.906616/full |
work_keys_str_mv | AT youngminna objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT hyosungjoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT lethitrang objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT luongdoanhquan objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT jihwanwoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT jihwanwoo objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses |