End to End Text to Speech Synthesis for Malay Language using Tacotron and Tacotron 2

Text-to-speech (TTS) technology is becoming increasingly popular in various fields such as education and business. However, the advancement of TTS technology for Malay language is slower compared to other language especially English language. The rise of artificial intelligence (AI) technology has...

Full description

Bibliographic Details
Main Authors: Abdul Aziz, Azrul Fahmi, Sabrina Tiun, Sabrina Tiun, Ruslan, Noraini
Format: Article
Language:English
Published: ijacsa 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/10565/1/J16421_03fadd928a98d4594999185deb803d1a.pdf
Description
Summary:Text-to-speech (TTS) technology is becoming increasingly popular in various fields such as education and business. However, the advancement of TTS technology for Malay language is slower compared to other language especially English language. The rise of artificial intelligence (AI) technology has sparked TTS technology into a new dimension. An end-to-end (E2E) TTS system that generates speech directly from text input is one of the latest AI technologies for TTS and implementing this E2E method into Malay language will help to expand the TTS technology for Malay language. This study involves the development and comparison of two end-to-end TTS models for the Malay language, namely Tacotron and Tacotron 2. Both models were trained using a Malay corpus consisting of text and speech and evaluated the synthesized speech using Mean Opinion Scores (MOS) for naturalness and intelligibility. The results show that Tacotron outperformed Tacotron 2 in terms of naturalness and intelligibility, with both models falling short of human speech quality. Improving TTS technology for Malay can encourage its use in a wider range of contexts.