Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study

Automatic Speech Recognition (ASR) systems are notorious for their poor performance in adverse conditions, leading to high sensitivity and low robustness. Due to the costly and time-consuming nature of creating extensive speech databases, addressing the issue of low robustness has become a promine...

Full description

Bibliographic Details
Main Authors: GALIC, J., GROZDIC, D.
Format: Article
Language:English
Published: Stefan cel Mare University of Suceava 2023-08-01
Series:Advances in Electrical and Computer Engineering
Subjects:
Online Access:http://dx.doi.org/10.4316/AECE.2023.03001
_version_ 1797725130107912192
author GALIC, J.
GROZDIC, D.
author_facet GALIC, J.
GROZDIC, D.
author_sort GALIC, J.
collection DOAJ
description Automatic Speech Recognition (ASR) systems are notorious for their poor performance in adverse conditions, leading to high sensitivity and low robustness. Due to the costly and time-consuming nature of creating extensive speech databases, addressing the issue of low robustness has become a prominent area of research, focusing on the synthetic generation of speech data using pre-existing natural speech. This paper examines the impact of standard data augmentation techniques, including pitch shift, time stretch, volume control, and their combination, on the accuracy of isolated-word ASR systems. The performance of three machine learning models, namely Hidden Markov Models (HMM), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN), is analyzed on two Serbian corpora of isolated words. The Whi-Spe speech database in neutral phonation is utilized for augmentation and training, and a specifically developed Python-based software tool is employed for the augmentation process in this research study. The conducted experiments demonstrate a statistically significant reduction in the Word Error Rate (WER) for the CNN-based recognizer on both testing datasets, achieved through a single augmentation technique based on pitch-shifting.
first_indexed 2024-03-12T10:26:43Z
format Article
id doaj.art-61e2ace3470249b2a1b89d68dedc0070
institution Directory Open Access Journal
issn 1582-7445
1844-7600
language English
last_indexed 2024-03-12T10:26:43Z
publishDate 2023-08-01
publisher Stefan cel Mare University of Suceava
record_format Article
series Advances in Electrical and Computer Engineering
spelling doaj.art-61e2ace3470249b2a1b89d68dedc00702023-09-02T09:41:55ZengStefan cel Mare University of SuceavaAdvances in Electrical and Computer Engineering1582-74451844-76002023-08-0123331210.4316/AECE.2023.03001Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative StudyGALIC, J.GROZDIC, D.Automatic Speech Recognition (ASR) systems are notorious for their poor performance in adverse conditions, leading to high sensitivity and low robustness. Due to the costly and time-consuming nature of creating extensive speech databases, addressing the issue of low robustness has become a prominent area of research, focusing on the synthetic generation of speech data using pre-existing natural speech. This paper examines the impact of standard data augmentation techniques, including pitch shift, time stretch, volume control, and their combination, on the accuracy of isolated-word ASR systems. The performance of three machine learning models, namely Hidden Markov Models (HMM), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN), is analyzed on two Serbian corpora of isolated words. The Whi-Spe speech database in neutral phonation is utilized for augmentation and training, and a specifically developed Python-based software tool is employed for the augmentation process in this research study. The conducted experiments demonstrate a statistically significant reduction in the Word Error Rate (WER) for the CNN-based recognizer on both testing datasets, achieved through a single augmentation technique based on pitch-shifting.http://dx.doi.org/10.4316/AECE.2023.03001artificial neural networksaudio databasesautomatic speech recognitionhidden markov modelssupport vector machines
spellingShingle GALIC, J.
GROZDIC, D.
Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
Advances in Electrical and Computer Engineering
artificial neural networks
audio databases
automatic speech recognition
hidden markov models
support vector machines
title Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
title_full Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
title_fullStr Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
title_full_unstemmed Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
title_short Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study
title_sort exploring the impact of data augmentation techniques on automatic speech recognition system development a comparative study
topic artificial neural networks
audio databases
automatic speech recognition
hidden markov models
support vector machines
url http://dx.doi.org/10.4316/AECE.2023.03001
work_keys_str_mv AT galicj exploringtheimpactofdataaugmentationtechniquesonautomaticspeechrecognitionsystemdevelopmentacomparativestudy
AT grozdicd exploringtheimpactofdataaugmentationtechniquesonautomaticspeechrecognitionsystemdevelopmentacomparativestudy