Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System
Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/3/1091 |
_version_ | 1797489356554895360 |
---|---|
author | Mohammed Hasan Ali Mustafa Musa Jaber Sura Khalil Abd Amjad Rehman Mazhar Javed Awan Daiva Vitkutė-Adžgauskienė Robertas Damaševičius Saeed Ali Bahaj |
author_facet | Mohammed Hasan Ali Mustafa Musa Jaber Sura Khalil Abd Amjad Rehman Mazhar Javed Awan Daiva Vitkutė-Adžgauskienė Robertas Damaševičius Saeed Ali Bahaj |
author_sort | Mohammed Hasan Ali |
collection | DOAJ |
description | Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment. |
first_indexed | 2024-03-10T00:16:20Z |
format | Article |
id | doaj.art-d4f2719cafe64a56b8afad2a07e52ede |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T00:16:20Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-d4f2719cafe64a56b8afad2a07e52ede2023-11-23T15:51:42ZengMDPI AGApplied Sciences2076-34172022-01-01123109110.3390/app12031091Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition SystemMohammed Hasan Ali0Mustafa Musa Jaber1Sura Khalil Abd2Amjad Rehman3Mazhar Javed Awan4Daiva Vitkutė-Adžgauskienė5Robertas Damaševičius6Saeed Ali Bahaj7Computer Techniques Engineering Department, Faculty of Information Technology, Imam Ja’afar Al-sadiq University, Baghdad 10021, IraqDepartment of Computer Science, Dijlah University College, Baghdad 00964, IraqDepartment of Computer Science, Dijlah University College, Baghdad 00964, IraqArtificial Intelligence and Data Analytics Laboratory, College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh 11586, Saudi ArabiaDepartment of Software Engineering, University of Management and Technology, Lahore 54770, PakistanDepartment of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, LithuaniaDepartment of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, LithuaniaMIS Department, College of Business Administration, Prince Sattam bin Abdulaziz University, Alkharj 11942, Saudi ArabiaAutomatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.https://www.mdpi.com/2076-3417/12/3/1091automatic speech recognitionMel-frequency cepstral coefficientssparse auto-encoder neural networkhidden Markov modelnatural language processingspeech recognition |
spellingShingle | Mohammed Hasan Ali Mustafa Musa Jaber Sura Khalil Abd Amjad Rehman Mazhar Javed Awan Daiva Vitkutė-Adžgauskienė Robertas Damaševičius Saeed Ali Bahaj Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System Applied Sciences automatic speech recognition Mel-frequency cepstral coefficients sparse auto-encoder neural network hidden Markov model natural language processing speech recognition |
title | Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System |
title_full | Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System |
title_fullStr | Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System |
title_full_unstemmed | Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System |
title_short | Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System |
title_sort | harris hawks sparse auto encoder networks for automatic speech recognition system |
topic | automatic speech recognition Mel-frequency cepstral coefficients sparse auto-encoder neural network hidden Markov model natural language processing speech recognition |
url | https://www.mdpi.com/2076-3417/12/3/1091 |
work_keys_str_mv | AT mohammedhasanali harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT mustafamusajaber harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT surakhalilabd harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT amjadrehman harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT mazharjavedawan harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT daivavitkuteadzgauskiene harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT robertasdamasevicius harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem AT saeedalibahaj harrishawkssparseautoencodernetworksforautomaticspeechrecognitionsystem |