A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation
Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy an...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-01-01
|
Series: | Heliyon |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844023100910 |
_version_ | 1797337119081889792 |
---|---|
author | Tamkeen Zehra Shah Muhammad Imran Sayed M. Ismail |
author_facet | Tamkeen Zehra Shah Muhammad Imran Sayed M. Ismail |
author_sort | Tamkeen Zehra Shah |
collection | DOAJ |
description | Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data. |
first_indexed | 2024-03-08T09:04:48Z |
format | Article |
id | doaj.art-c7a00be8afd245eb8b3715e1a4a4de90 |
institution | Directory Open Access Journal |
issn | 2405-8440 |
language | English |
last_indexed | 2024-03-08T09:04:48Z |
publishDate | 2024-01-01 |
publisher | Elsevier |
record_format | Article |
series | Heliyon |
spelling | doaj.art-c7a00be8afd245eb8b3715e1a4a4de902024-02-01T06:30:20ZengElsevierHeliyon2405-84402024-01-01101e22883A diachronic study determining syntactic and semantic features of Urdu-English neural machine translationTamkeen Zehra Shah0Muhammad Imran1Sayed M. Ismail2Institute of Space Technology, Islamabad, PakistanPrince Sultan University, Saudi Arabia; The University of Sahiwal, Pakistan; Corresponding author.Prince Sattam bin Abdulaziz University, Saudi ArabiaMachine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.http://www.sciencedirect.com/science/article/pii/S2405844023100910Neural machine translationUrduLow-resource languageGoogle translateInterlinear glossComparative syntax |
spellingShingle | Tamkeen Zehra Shah Muhammad Imran Sayed M. Ismail A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation Heliyon Neural machine translation Urdu Low-resource language Google translate Interlinear gloss Comparative syntax |
title | A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation |
title_full | A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation |
title_fullStr | A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation |
title_full_unstemmed | A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation |
title_short | A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation |
title_sort | diachronic study determining syntactic and semantic features of urdu english neural machine translation |
topic | Neural machine translation Urdu Low-resource language Google translate Interlinear gloss Comparative syntax |
url | http://www.sciencedirect.com/science/article/pii/S2405844023100910 |
work_keys_str_mv | AT tamkeenzehrashah adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation AT muhammadimran adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation AT sayedmismail adiachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation AT tamkeenzehrashah diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation AT muhammadimran diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation AT sayedmismail diachronicstudydeterminingsyntacticandsemanticfeaturesofurduenglishneuralmachinetranslation |