Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems

This paper presents our work towards developing a new speech corpus for Modern Standard Arabic (MSA), which can be used for implementing and evaluating Arabic speaker-independent, large vocabulary, automatic, and continuous speech recognition systems. The speech corpus was recorded by 40 (20 male an...

Full description

Bibliographic Details
Main Authors: Abushariah, Mohammad Abd-Alrahman Mahmoud, Raja Zainal Abidin, Raja Noor Ainon, Zainuddin, Roziati, Alqudah, Assal Ali Mustafa, Ahmed, Moustafa Elshafei, Khalifa, Othman Omran
Format: Article
Language:English
Published: Elsevier 2011
Subjects:
Online Access:http://irep.iium.edu.my/5625/2/Modern_standard_Arabic_speech_corpus_for.pdf
_version_ 1796875283051053056
author Abushariah, Mohammad Abd-Alrahman Mahmoud
Raja Zainal Abidin, Raja Noor Ainon
Zainuddin, Roziati
Alqudah, Assal Ali Mustafa
Ahmed, Moustafa Elshafei
Khalifa, Othman Omran
author_facet Abushariah, Mohammad Abd-Alrahman Mahmoud
Raja Zainal Abidin, Raja Noor Ainon
Zainuddin, Roziati
Alqudah, Assal Ali Mustafa
Ahmed, Moustafa Elshafei
Khalifa, Othman Omran
author_sort Abushariah, Mohammad Abd-Alrahman Mahmoud
collection IIUM
description This paper presents our work towards developing a new speech corpus for Modern Standard Arabic (MSA), which can be used for implementing and evaluating Arabic speaker-independent, large vocabulary, automatic, and continuous speech recognition systems. The speech corpus was recorded by 40 (20 male and 20 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). Three development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Based on our third development phase using 11 hours of training speech data, the acoustic model is composed of 16 Gaussian mixture distributions and the state distributions tied to 300 senones. Using three different data sets, the third development phase obtained 94.32% and 8.10% average word recognition correctness rate and average Word Error Rate (WER), respectively, for same speakers with different sentences (testing sentences). For different speakers with same sentences (training sentences), this work obtained 98.10% and 2.67% average word recognition correctness rate and average WER, respectively, whereas for different speakers with different sentences (testing sentences) this work obtained 93.73% and 8.75% average word recognition correctness rate and average WER, respectively.
first_indexed 2024-03-05T22:36:32Z
format Article
id oai:generic.eprints.org:5625
institution International Islamic University Malaysia
language English
last_indexed 2024-03-05T22:36:32Z
publishDate 2011
publisher Elsevier
record_format dspace
spelling oai:generic.eprints.org:56252020-10-21T01:42:59Z http://irep.iium.edu.my/5625/ Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems Abushariah, Mohammad Abd-Alrahman Mahmoud Raja Zainal Abidin, Raja Noor Ainon Zainuddin, Roziati Alqudah, Assal Ali Mustafa Ahmed, Moustafa Elshafei Khalifa, Othman Omran T Technology (General) This paper presents our work towards developing a new speech corpus for Modern Standard Arabic (MSA), which can be used for implementing and evaluating Arabic speaker-independent, large vocabulary, automatic, and continuous speech recognition systems. The speech corpus was recorded by 40 (20 male and 20 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). Three development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Based on our third development phase using 11 hours of training speech data, the acoustic model is composed of 16 Gaussian mixture distributions and the state distributions tied to 300 senones. Using three different data sets, the third development phase obtained 94.32% and 8.10% average word recognition correctness rate and average Word Error Rate (WER), respectively, for same speakers with different sentences (testing sentences). For different speakers with same sentences (training sentences), this work obtained 98.10% and 2.67% average word recognition correctness rate and average WER, respectively, whereas for different speakers with different sentences (testing sentences) this work obtained 93.73% and 8.75% average word recognition correctness rate and average WER, respectively. Elsevier 2011 Article PeerReviewed application/pdf en http://irep.iium.edu.my/5625/2/Modern_standard_Arabic_speech_corpus_for.pdf Abushariah, Mohammad Abd-Alrahman Mahmoud and Raja Zainal Abidin, Raja Noor Ainon and Zainuddin, Roziati and Alqudah, Assal Ali Mustafa and Ahmed, Moustafa Elshafei and Khalifa, Othman Omran (2011) Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems. Journal of the Franklin Institute. ISSN 0016-0032 (In Press) http://www.sciencedirect.com 10.1016/j.jfranklin.2011.04.011
spellingShingle T Technology (General)
Abushariah, Mohammad Abd-Alrahman Mahmoud
Raja Zainal Abidin, Raja Noor Ainon
Zainuddin, Roziati
Alqudah, Assal Ali Mustafa
Ahmed, Moustafa Elshafei
Khalifa, Othman Omran
Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title_full Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title_fullStr Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title_full_unstemmed Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title_short Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
title_sort modern standard arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
topic T Technology (General)
url http://irep.iium.edu.my/5625/2/Modern_standard_Arabic_speech_corpus_for.pdf
work_keys_str_mv AT abushariahmohammadabdalrahmanmahmoud modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems
AT rajazainalabidinrajanoorainon modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems
AT zainuddinroziati modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems
AT alqudahassalalimustafa modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems
AT ahmedmoustafaelshafei modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems
AT khalifaothmanomran modernstandardarabicspeechcorpusforimplementingandevaluatingautomaticcontinuousspeechrecognitionsystems