FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition

As the architecture of deep learning-based speech recognizers has recently changed to the end-to-end style, increasing the effective amount of training data has become an important issue. To tackle this issue, various data augmentation techniques to create additional training data by transforming la...

Full description

Bibliographic Details
Main Authors:	Seong-Su Lim, Oh-Wook Kwon
Format:	Article
Language:	English
Published:	MDPI AG 2022-07-01
Series:	Applied Sciences
Subjects:	data augmentation end-to-end speech recognition frame rate
Online Access:	https://www.mdpi.com/2076-3417/12/15/7619

_version_	1827626127831597056
author	Seong-Su Lim Oh-Wook Kwon
author_facet	Seong-Su Lim Oh-Wook Kwon
author_sort	Seong-Su Lim
collection	DOAJ
description	As the architecture of deep learning-based speech recognizers has recently changed to the end-to-end style, increasing the effective amount of training data has become an important issue. To tackle this issue, various data augmentation techniques to create additional training data by transforming labeled data have been studied. We propose a method called FrameAugment to augment data by changing the speed of speech locally for selected sections, which is different from the conventional speed perturbation technique that changes the speed of speech uniformly for the entire utterance. To change the speed of the selected sections of speech, the number of frames for the randomly selected sections is adjusted through linear interpolation in the spectrogram domain. The proposed method is shown to achieve 6.8% better performance than the baseline in the WSJ database and 9.5% better than the baseline in the LibriSpeech database. It is also confirmed that the proposed method further improves speech recognition performance when it is combined with the previous data augmentation techniques.
first_indexed	2024-03-09T12:48:29Z
format	Article
id	doaj.art-e82daf3d55f945edb122b550b162e9be
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T12:48:29Z
publishDate	2022-07-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-e82daf3d55f945edb122b550b162e9be2023-11-30T22:10:05ZengMDPI AGApplied Sciences2076-34172022-07-011215761910.3390/app12157619FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech RecognitionSeong-Su Lim0Oh-Wook Kwon1Major in Control and Robot Engineering, Chungbuk National University, Cheongju 28644, KoreaDepartment of Intelligent Systems and Robotics, Chungbuk National University, Cheongju 28644, KoreaAs the architecture of deep learning-based speech recognizers has recently changed to the end-to-end style, increasing the effective amount of training data has become an important issue. To tackle this issue, various data augmentation techniques to create additional training data by transforming labeled data have been studied. We propose a method called FrameAugment to augment data by changing the speed of speech locally for selected sections, which is different from the conventional speed perturbation technique that changes the speed of speech uniformly for the entire utterance. To change the speed of the selected sections of speech, the number of frames for the randomly selected sections is adjusted through linear interpolation in the spectrogram domain. The proposed method is shown to achieve 6.8% better performance than the baseline in the WSJ database and 9.5% better than the baseline in the LibriSpeech database. It is also confirmed that the proposed method further improves speech recognition performance when it is combined with the previous data augmentation techniques.https://www.mdpi.com/2076-3417/12/15/7619data augmentationend-to-end speech recognitionframe rate
spellingShingle	Seong-Su Lim Oh-Wook Kwon FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition Applied Sciences data augmentation end-to-end speech recognition frame rate
title	FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
title_full	FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
title_fullStr	FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
title_full_unstemmed	FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
title_short	FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition
title_sort	frameaugment a simple data augmentation method for encoder decoder speech recognition
topic	data augmentation end-to-end speech recognition frame rate
url	https://www.mdpi.com/2076-3417/12/15/7619
work_keys_str_mv	AT seongsulim frameaugmentasimpledataaugmentationmethodforencoderdecoderspeechrecognition AT ohwookkwon frameaugmentasimpledataaugmentationmethodforencoderdecoderspeechrecognition

FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition

Similar Items