Southeast Asian multi-language speech recognition engine

In the digital era, all kinds of technology advancements have reshaped human life to become easier, faster, and smarter than ever before. Over the past decade, voice services have been adopted across a variety of industries as speech technology being propelled forward. The market prospects are in tu...

Full description

Bibliographic Details
Main Author: Zhang, Keke
Other Authors: Ling Keck Voon
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158457
Description
Summary:In the digital era, all kinds of technology advancements have reshaped human life to become easier, faster, and smarter than ever before. Over the past decade, voice services have been adopted across a variety of industries as speech technology being propelled forward. The market prospects are in turn boosted as multiple applications such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and AI Assistants are gaining increasing awareness. Amid the Covid-19 crisis, the global speech technology market remained resilient since speech technology is one of the major enablers of contactless interaction. Moreover, driven by the advancements in artificial intelligence, speech technology has become more accessible to a wider range of users at a lower cost in recent years. As a result, more challenges will arise inevitably and accented speech with language mixing is one of them. This project aims to develop an Automatic Speech Recognition (ASR) engine that can be utilised in Singapore, with capabilities to process language mixing input (English mixed with Mandarin) and to produce useful output with low error rate. The focus of this project is on automated text corpus collection, language model training, ASR integration and testing. The performance of the ASR will be evaluated by Mixed Error Rate (MER).