Summary: | Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures. There are hundreds of different sign languages in the world. Furthermore, like natural languages, there exist different dialects for many sign languages. In order to facilitate the deaf community several different repositories of video gestures are available for many sign languages of the world. These video based repositories do not support the development of an automated language translation systems. This research aims to investigate the idea of engaging the deaf community for the development and validation of a parallel corpus for a sign language and its dialects. As a principal contribution, this research presents a framework for building a parallel corpus for sign languages by harnessing the powers of crowdsourcing with editorial manager, thus it engages a diversified set of stakeholders for building and validating a repository in a quality controlled manner. It further presents processes to develop a word-level parallel corpus for different dialects of a sign language; and a process to develop sentence-level translation corpus comprising of source and translated sentences. The proposed framework has been successfully implemented and involved different stakeholders to build corpus. As a result, a word-level parallel corpus comprising of the gestures of almost 700 words of Pakistan Sign Language (PSL) has been developed. While, a sentence-level translation corpus comprising of more than 8000 sentences for different tenses has also been developed for PSL. This sentence-level corpus can be used in developing and evaluating machine translation models for natural to sign language translation and vice-versa. While the machine-readable word level parallel corpus will help in generating avatar based videos for the translated sentences in different dialects of a sign language.
|