A conceptual framework for malay-english mixed-language question answering system

Mixed language has turned into a current trend of language which refers to combining two or more languages either in spoken or written form. It has been widely used in social media forums to improve communication and for a greater range of expression. The current question answering (QA) system only...

Full description

Bibliographic Details
Main Authors: Lim, H. T., Huspi, S. H., Ibrahim, R.
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/95670/1/LimHuiTing2021_AConceptualFramework.pdf
Description
Summary:Mixed language has turned into a current trend of language which refers to combining two or more languages either in spoken or written form. It has been widely used in social media forums to improve communication and for a greater range of expression. The current question answering (QA) system only supports monolingual queries, which restricts the capability of multilingual users to have a natural interaction with the system. In recent years, there has been a rise of interest in multilingual QA systems where translation models merged with machine learning algorithms in question classification are the commonly used solution. However, using words from other languages in a single sentence has led to the problem of the inability to identify code-switch from the monolingual sentence; this has also caused the problem of limited captured language context from machine translation processed mistranslated questions. The informal mixed-language representation that disobeys the natural linguistic rule in particular languages provides a challenge for automated QA systems, as the systems would need to translate and extract answers for the given questions. Additionally, lack of public resources such as Chunker, POS Tagger, and WordNet for mixed-language, especially for Malay-English, leads to low performance of the translation and classification model. Furthermore, the use of machine learning algorithms in question classification requires a large number of structured training data to ensure performance. This is impracticable in the Malay-English mixed-language domain since the availability of the mixed-language dataset is still an issue. To solve these problems, we aim to propose a framework consisting of the combination of enhanced translation models with deep learning; by using Convolutional Neural Networks (CNN) to address the Malay-English mixed-language question classification to generate the best answer. The first part will study the machine translation model, where word-level language identification and text normalization towards Malay-English mixed-language questions will be developed. The second part will focus on the deep learning algorithm, where we will explore CNN as the classification model to assist in the translated questions to provide the best answer. Thus, in this paper, a framework consisting of an enhanced translation model for Malay-English, and also an end-to-end mixed-language question answering system for the Malay-English QA system, is presented. This research will provide a significant contribution to a multilingual forum platform and also to intelligent QA systems (chatbots).