Phonetically rich and balanced text and speech corpora for Arabic language

This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three...

Full description

Bibliographic Details
Main Authors: Abushariah, Mohammad Abd-Alrahman Mahmoud, Ainon, Raja Noor, Zainuddin, Roziati, Elshafei, Moustafa, Khalifa, Othman Omran
Format: Article
Language:English
Published: Springer 2011
Subjects:
Online Access:http://irep.iium.edu.my/10572/4/Phonetically_rich_Irep_ID10572.pdf
_version_ 1796875807768969216
author Abushariah, Mohammad Abd-Alrahman Mahmoud
Ainon, Raja Noor
Zainuddin, Roziati
Elshafei, Moustafa
Khalifa, Othman Omran
author_facet Abushariah, Mohammad Abd-Alrahman Mahmoud
Ainon, Raja Noor
Zainuddin, Roziati
Elshafei, Moustafa
Khalifa, Othman Omran
author_sort Abushariah, Mohammad Abd-Alrahman Mahmoud
collection IIUM
description This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for
first_indexed 2024-03-05T22:43:49Z
format Article
id oai:generic.eprints.org:10572
institution International Islamic University Malaysia
language English
last_indexed 2024-03-05T22:43:49Z
publishDate 2011
publisher Springer
record_format dspace
spelling oai:generic.eprints.org:105722020-10-22T01:05:56Z http://irep.iium.edu.my/10572/ Phonetically rich and balanced text and speech corpora for Arabic language Abushariah, Mohammad Abd-Alrahman Mahmoud Ainon, Raja Noor Zainuddin, Roziati Elshafei, Moustafa Khalifa, Othman Omran TK7885 Computer engineering This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for Springer 2011-11-05 Article PeerReviewed application/pdf en http://irep.iium.edu.my/10572/4/Phonetically_rich_Irep_ID10572.pdf Abushariah, Mohammad Abd-Alrahman Mahmoud and Ainon, Raja Noor and Zainuddin, Roziati and Elshafei, Moustafa and Khalifa, Othman Omran (2011) Phonetically rich and balanced text and speech corpora for Arabic language. Language Resources and Evaluation. pp. 1-34. ISSN 1574-020X http://dx.doi.org/10.1007/s10579-011-9166-8 10.1007/s10579-011-9166-8
spellingShingle TK7885 Computer engineering
Abushariah, Mohammad Abd-Alrahman Mahmoud
Ainon, Raja Noor
Zainuddin, Roziati
Elshafei, Moustafa
Khalifa, Othman Omran
Phonetically rich and balanced text and speech corpora for Arabic language
title Phonetically rich and balanced text and speech corpora for Arabic language
title_full Phonetically rich and balanced text and speech corpora for Arabic language
title_fullStr Phonetically rich and balanced text and speech corpora for Arabic language
title_full_unstemmed Phonetically rich and balanced text and speech corpora for Arabic language
title_short Phonetically rich and balanced text and speech corpora for Arabic language
title_sort phonetically rich and balanced text and speech corpora for arabic language
topic TK7885 Computer engineering
url http://irep.iium.edu.my/10572/4/Phonetically_rich_Irep_ID10572.pdf
work_keys_str_mv AT abushariahmohammadabdalrahmanmahmoud phoneticallyrichandbalancedtextandspeechcorporaforarabiclanguage
AT ainonrajanoor phoneticallyrichandbalancedtextandspeechcorporaforarabiclanguage
AT zainuddinroziati phoneticallyrichandbalancedtextandspeechcorporaforarabiclanguage
AT elshafeimoustafa phoneticallyrichandbalancedtextandspeechcorporaforarabiclanguage
AT khalifaothmanomran phoneticallyrichandbalancedtextandspeechcorporaforarabiclanguage