Summary: | In recent years, multilingual question answering has been an emergent research topic and has attracted much attention. Although systems for English and other rich-resource languages that rely on various advanced deep learning-based techniques have been highly developed, most of them in low-resource languages are impractical due to data insufficiency. Accordingly, many studies have attempted to improve the performance of low-resource languages in a zero-shot or few-shot manner based on multilingual bidirectional encoder representations from transformers (mBERT) by transferring knowledge learned from rich-resource languages to low-resource languages. Most methods require either a large amount of unlabeled data or a small set of labeled data for low-resource languages. In Wikipedia, 169 languages have less than 10,000 articles, and 48 languages have less than 1,000 articles. This reason motivates us to conduct a zero-shot multilingual question answering task under a zero-resource scenario. Thus, this study proposes a framework to fine-tune the original mBERT using data from rich-resource languages, and the resulting model can be used for low-resource languages in a zero-shot and zero-resource manner. Compared to several baseline systems, which require millions of unlabeled data for low-resource languages, the performance of our proposed framework is not only highly comparative but is also better for languages used in training.
|