Speech Recognition for Task Domains with Sparse Matched Training Data

We propose two approaches to handle speech recognition for task domains with sparse matched training data. One is an active learning method that selects training data for the target domain from another general domain that already has a significant amount of labeled speech data. This method uses attr...

Full description

Bibliographic Details
Main Authors:	Byung Ok Kang, Hyeong Bae Jeon, Jeon Gue Park
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Applied Sciences
Subjects:	automatic speech recognition sparse training data deep neural network active learning transfer learning
Online Access:	https://www.mdpi.com/2076-3417/10/18/6155

_version_	1797554605835419648
author	Byung Ok Kang Hyeong Bae Jeon Jeon Gue Park
author_facet	Byung Ok Kang Hyeong Bae Jeon Jeon Gue Park
author_sort	Byung Ok Kang
collection	DOAJ
description	We propose two approaches to handle speech recognition for task domains with sparse matched training data. One is an active learning method that selects training data for the target domain from another general domain that already has a significant amount of labeled speech data. This method uses attribute-disentangled latent variables. For the active learning process, we designed an integrated system consisting of a variational autoencoder with an encoder that infers latent variables with disentangled attributes from the input speech, and a classifier that selects training data with attributes matching the target domain. The other method combines data augmentation methods for generating matched target domain speech data and transfer learning methods based on teacher/student learning. To evaluate the proposed method, we experimented with various task domains with sparse matched training data. The experimental results show that the proposed method has qualitative characteristics that are suitable for the desired purpose, it outperforms random selection, and is comparable to using an equal amount of additional target domain data.
first_indexed	2024-03-10T16:34:28Z
format	Article
id	doaj.art-fcc47a5bbbac4264af1f90632a62b6cb
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T16:34:28Z
publishDate	2020-09-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-fcc47a5bbbac4264af1f90632a62b6cb2023-11-20T12:37:25ZengMDPI AGApplied Sciences2076-34172020-09-011018615510.3390/app10186155Speech Recognition for Task Domains with Sparse Matched Training DataByung Ok Kang0Hyeong Bae Jeon1Jeon Gue Park2Electronics and Telecommunications Research Institute, Daejeon 34129, KoreaElectronics and Telecommunications Research Institute, Daejeon 34129, KoreaElectronics and Telecommunications Research Institute, Daejeon 34129, KoreaWe propose two approaches to handle speech recognition for task domains with sparse matched training data. One is an active learning method that selects training data for the target domain from another general domain that already has a significant amount of labeled speech data. This method uses attribute-disentangled latent variables. For the active learning process, we designed an integrated system consisting of a variational autoencoder with an encoder that infers latent variables with disentangled attributes from the input speech, and a classifier that selects training data with attributes matching the target domain. The other method combines data augmentation methods for generating matched target domain speech data and transfer learning methods based on teacher/student learning. To evaluate the proposed method, we experimented with various task domains with sparse matched training data. The experimental results show that the proposed method has qualitative characteristics that are suitable for the desired purpose, it outperforms random selection, and is comparable to using an equal amount of additional target domain data.https://www.mdpi.com/2076-3417/10/18/6155automatic speech recognitionsparse training datadeep neural networkactive learningtransfer learning
spellingShingle	Byung Ok Kang Hyeong Bae Jeon Jeon Gue Park Speech Recognition for Task Domains with Sparse Matched Training Data Applied Sciences automatic speech recognition sparse training data deep neural network active learning transfer learning
title	Speech Recognition for Task Domains with Sparse Matched Training Data
title_full	Speech Recognition for Task Domains with Sparse Matched Training Data
title_fullStr	Speech Recognition for Task Domains with Sparse Matched Training Data
title_full_unstemmed	Speech Recognition for Task Domains with Sparse Matched Training Data
title_short	Speech Recognition for Task Domains with Sparse Matched Training Data
title_sort	speech recognition for task domains with sparse matched training data
topic	automatic speech recognition sparse training data deep neural network active learning transfer learning
url	https://www.mdpi.com/2076-3417/10/18/6155
work_keys_str_mv	AT byungokkang speechrecognitionfortaskdomainswithsparsematchedtrainingdata AT hyeongbaejeon speechrecognitionfortaskdomainswithsparsematchedtrainingdata AT jeonguepark speechrecognitionfortaskdomainswithsparsematchedtrainingdata

Speech Recognition for Task Domains with Sparse Matched Training Data

Similar Items