Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition...

Full description

Bibliographic Details
Main Authors: June-Woo Kim, Hyekyung Yoon, Ho-Young Jung
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9548063/
_version_ 1818587519728484352
author June-Woo Kim
Hyekyung Yoon
Ho-Young Jung
author_facet June-Woo Kim
Hyekyung Yoon
Ho-Young Jung
author_sort June-Woo Kim
collection DOAJ
description We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority’s speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy.
first_indexed 2024-12-16T09:10:09Z
format Article
id doaj.art-362c72b7257a4bb89169e5870d200ee5
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T09:10:09Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-362c72b7257a4bb89169e5870d200ee52022-12-21T22:37:00ZengIEEEIEEE Access2169-35362021-01-01913647613648610.1109/ACCESS.2021.31156089548063Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real EnvironmentsJune-Woo Kim0https://orcid.org/0000-0003-0111-300XHyekyung Yoon1https://orcid.org/0000-0002-1507-5967Ho-Young Jung2https://orcid.org/0000-0003-0398-831XDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Artificial Intelligence, Graduate School, Kyungpook National University, Daegu, Republic of KoreaWe address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority’s speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy.https://ieeexplore.ieee.org/document/9548063/Speech recognitionvoice translationspectral feature transformage-on-demand speech recognition
spellingShingle June-Woo Kim
Hyekyung Yoon
Ho-Young Jung
Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
IEEE Access
Speech recognition
voice translation
spectral feature transform
age-on-demand speech recognition
title Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
title_full Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
title_fullStr Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
title_full_unstemmed Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
title_short Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments
title_sort linguistic coupled age to age voice translation to improve speech recognition performance in real environments
topic Speech recognition
voice translation
spectral feature transform
age-on-demand speech recognition
url https://ieeexplore.ieee.org/document/9548063/
work_keys_str_mv AT junewookim linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments
AT hyekyungyoon linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments
AT hoyoungjung linguisticcoupledagetoagevoicetranslationtoimprovespeechrecognitionperformanceinrealenvironments