A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation

Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy an...

Full description

Bibliographic Details
Main Authors: Tamkeen Zehra Shah, Muhammad Imran, Sayed M. Ismail
Format: Article
Language:English
Published: Elsevier 2024-01-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844023100910
_version_ 1797337119081889792
author Tamkeen Zehra Shah
Muhammad Imran
Sayed M. Ismail
author_facet Tamkeen Zehra Shah
Muhammad Imran
Sayed M. Ismail
author_sort Tamkeen Zehra Shah
collection DOAJ
description Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.
first_indexed 2024-03-08T09:04:48Z
format Article
id doaj.art-c7a00be8afd245eb8b3715e1a4a4de90
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-03-08T09:04:48Z
publishDate 2024-01-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-c7a00be8afd245eb8b3715e1a4a4de902024-02-01T06:30:20ZengElsevierHeliyon2405-84402024-01-01101e22883A diachronic study determining syntactic and semantic features of Urdu-English neural machine translationTamkeen Zehra Shah0Muhammad Imran1Sayed M. Ismail2Institute of Space Technology, Islamabad, PakistanPrince Sultan University, Saudi Arabia; The University of Sahiwal, Pakistan; Corresponding author.Prince Sattam bin Abdulaziz University, Saudi ArabiaMachine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.http://www.sciencedirect.com/science/article/pii/S2405844023100910Neural machine translationUrduLow-resource languageGoogle translateInterlinear glossComparative syntax
spellingShingle Tamkeen Zehra Shah
Muhammad Imran
Sayed M. Ismail
A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
Heliyon
Neural machine translation
Urdu
Low-resource language
Google translate
Interlinear gloss
Comparative syntax
title A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
title_full A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
title_fullStr A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
title_full_unstemmed A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
title_short A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
title_sort diachronic study determining syntactic and semantic features of urdu english neural machine translation
topic Neural machine translation
Urdu
Low-resource language
Google translate
Interlinear gloss
Comparative syntax
url http://www.sciencedirect.com/science/article/pii/S2405844023100910
work_keys_str_mv AT tamkeenzehrashah adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation
AT muhammadimran adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation
AT sayedmismail adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation
AT tamkeenzehrashah diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation
AT muhammadimran diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation
AT sayedmismail diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation