A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages

Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures. There are hundreds of different sign languages in the world. Furthermore, like natural languages, there exist diff...

Full description

Bibliographic Details
Main Authors: Uzma Farooq, Mohd Shafry Mohd Rahim, Nabeel Sabir Khan, Saim Rasheed, Adnan Abid
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9462917/
_version_ 1830083680831275008
author Uzma Farooq
Mohd Shafry Mohd Rahim
Nabeel Sabir Khan
Saim Rasheed
Adnan Abid
author_facet Uzma Farooq
Mohd Shafry Mohd Rahim
Nabeel Sabir Khan
Saim Rasheed
Adnan Abid
author_sort Uzma Farooq
collection DOAJ
description Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures. There are hundreds of different sign languages in the world. Furthermore, like natural languages, there exist different dialects for many sign languages. In order to facilitate the deaf community several different repositories of video gestures are available for many sign languages of the world. These video based repositories do not support the development of an automated language translation systems. This research aims to investigate the idea of engaging the deaf community for the development and validation of a parallel corpus for a sign language and its dialects. As a principal contribution, this research presents a framework for building a parallel corpus for sign languages by harnessing the powers of crowdsourcing with editorial manager, thus it engages a diversified set of stakeholders for building and validating a repository in a quality controlled manner. It further presents processes to develop a word-level parallel corpus for different dialects of a sign language; and a process to develop sentence-level translation corpus comprising of source and translated sentences. The proposed framework has been successfully implemented and involved different stakeholders to build corpus. As a result, a word-level parallel corpus comprising of the gestures of almost 700 words of Pakistan Sign Language (PSL) has been developed. While, a sentence-level translation corpus comprising of more than 8000 sentences for different tenses has also been developed for PSL. This sentence-level corpus can be used in developing and evaluating machine translation models for natural to sign language translation and vice-versa. While the machine-readable word level parallel corpus will help in generating avatar based videos for the translated sentences in different dialects of a sign language.
first_indexed 2024-12-14T16:27:22Z
format Article
id doaj.art-4a634d06e3aa4114b79522620c795633
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T16:27:22Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-4a634d06e3aa4114b79522620c7956332022-12-21T22:54:39ZengIEEEIEEE Access2169-35362021-01-019917889180610.1109/ACCESS.2021.30914339462917A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign LanguagesUzma Farooq0https://orcid.org/0000-0001-8213-2469Mohd Shafry Mohd Rahim1Nabeel Sabir Khan2https://orcid.org/0000-0003-0758-6019Saim Rasheed3https://orcid.org/0000-0003-1584-2991Adnan Abid4https://orcid.org/0000-0003-2602-2876Department of Computer Science, Universiti Teknologi Malaysia, Johor Baharu, MalaysiaDepartment of Computer Science, Universiti Teknologi Malaysia, Johor Baharu, MalaysiaSchool of Systems and Technology, University of Management and Technology, Lahore, PakistanFaculty of Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaSchool of Systems and Technology, University of Management and Technology, Lahore, PakistanSign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures. There are hundreds of different sign languages in the world. Furthermore, like natural languages, there exist different dialects for many sign languages. In order to facilitate the deaf community several different repositories of video gestures are available for many sign languages of the world. These video based repositories do not support the development of an automated language translation systems. This research aims to investigate the idea of engaging the deaf community for the development and validation of a parallel corpus for a sign language and its dialects. As a principal contribution, this research presents a framework for building a parallel corpus for sign languages by harnessing the powers of crowdsourcing with editorial manager, thus it engages a diversified set of stakeholders for building and validating a repository in a quality controlled manner. It further presents processes to develop a word-level parallel corpus for different dialects of a sign language; and a process to develop sentence-level translation corpus comprising of source and translated sentences. The proposed framework has been successfully implemented and involved different stakeholders to build corpus. As a result, a word-level parallel corpus comprising of the gestures of almost 700 words of Pakistan Sign Language (PSL) has been developed. While, a sentence-level translation corpus comprising of more than 8000 sentences for different tenses has also been developed for PSL. This sentence-level corpus can be used in developing and evaluating machine translation models for natural to sign language translation and vice-versa. While the machine-readable word level parallel corpus will help in generating avatar based videos for the translated sentences in different dialects of a sign language.https://ieeexplore.ieee.org/document/9462917/CrowdsourcingHamNoSysparallel corpussign language dictionarysign writing
spellingShingle Uzma Farooq
Mohd Shafry Mohd Rahim
Nabeel Sabir Khan
Saim Rasheed
Adnan Abid
A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
IEEE Access
Crowdsourcing
HamNoSys
parallel corpus
sign language dictionary
sign writing
title A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
title_full A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
title_fullStr A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
title_full_unstemmed A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
title_short A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages
title_sort crowdsourcing based framework for the development and validation of machine readable parallel corpus for sign languages
topic Crowdsourcing
HamNoSys
parallel corpus
sign language dictionary
sign writing
url https://ieeexplore.ieee.org/document/9462917/
work_keys_str_mv AT uzmafarooq acrowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT mohdshafrymohdrahim acrowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT nabeelsabirkhan acrowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT saimrasheed acrowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT adnanabid acrowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT uzmafarooq crowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT mohdshafrymohdrahim crowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT nabeelsabirkhan crowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT saimrasheed crowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages
AT adnanabid crowdsourcingbasedframeworkforthedevelopmentandvalidationofmachinereadableparallelcorpusforsignlanguages