Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9548063/ |
_version_ | 1818587519728484352 |
---|---|
author | June-Woo Kim Hyekyung Yoon Ho-Young Jung |
author_facet | June-Woo Kim Hyekyung Yoon Ho-Young Jung |
author_sort | June-Woo Kim |
collection | DOAJ |
description | We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority’s speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy. |
first_indexed | 2024-12-16T09:10:09Z |
format | Article |
id | doaj.art-362c72b7257a4bb89169e5870d200ee5 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T09:10:09Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-362c72b7257a4bb89169e5870d200ee52022-12-21T22:37:00ZengIEEEIEEE Access2169-35362021-01-01913647613648610.1109/ACCESS.2021.31156089548063Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real EnvironmentsJune-Woo Kim0https://orcid.org/0000-0003-0111-300XHyekyung Yoon1https://orcid.org/0000-0002-1507-5967Ho-Young Jung2https://orcid.org/0000-0003-0398-831XDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaWe address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority’s speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy.https://ieeexplore.ieee.org/document/9548063/Speech recognitionvoice translationspectral feature transformage-on-demand speech recognition |
spellingShingle | June-Woo Kim Hyekyung Yoon Ho-Young Jung Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments IEEE Access Speech recognition voice translation spectral feature transform age-on-demand speech recognition |
title | Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments |
title_full | Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments |
title_fullStr | Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments |
title_full_unstemmed | Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments |
title_short | Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments |
title_sort | linguistic coupled age to age voice translation to improve speech recognition performance in real environments |
topic | Speech recognition voice translation spectral feature transform age-on-demand speech recognition |
url | https://ieeexplore.ieee.org/document/9548063/ |
work_keys_str_mv | AT junewookim linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments AT hyekyungyoon linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments AT hoyoungjung linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments |