A diachronic study determining syntactic and semantic features of Urdu-English neural machine translation

被引:8
作者
Shah, Tamkeen Zehra [1 ]
Imran, Muhammad [2 ,3 ]
Ismail, Sayed M. [4 ]
机构
[1] Inst Space Technol, Islamabad, Pakistan
[2] Prince Sultan Univ, Riyadh, Saudi Arabia
[3] Univ Sahiwal, Sahiwal, Pakistan
[4] Prince Sattam Bin Abdulaziz Univ, Alkharj, Saudi Arabia
关键词
Neural machine translation; Urdu; Low-resource language; Google translate; Interlinear gloss; Comparative syntax;
D O I
10.1016/j.heliyon.2023.e22883
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.
引用
收藏
页数:16
相关论文
共 34 条
  • [1] Automated and Human Interaction in Written Discourse: A Contrastive Parallel Corpus-based Investigation of Metadiscourse Features in Machine-Human Translations
    Afzaal, Muhammad
    Imran, Muhammad
    Du, Xiangtao
    Almusharraf, Norah
    [J]. SAGE OPEN, 2022, 12 (04):
  • [2] Afzal M.I., 2022, Webology, P1735
  • [3] Aiken M., 2019, Stud Linguist Literature, V3, P253, DOI [10.22158/sll.v3n3p253, DOI 10.22158/SLL.V3N3P253]
  • [4] [Anonymous], 2023, Mental Health Information in Urdu
  • [5] Benjamin M., 2021, Teach You Backwards, Apr. 22
  • [6] Caswell I., 2020, Google AI Blog
  • [7] Costa A, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1231
  • [8] Durrani N., 2006, P 12 HIM LANG S 27 A
  • [9] Elaffendi Mohammed, 2022, HINDAWI, DOI [10.1155/2022, DOI 10.1155/2022]
  • [10] The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing
    Ghafoor, Abdul
    Imran, Ali Shariq
    Daudpota, Sher Muhammad
    Kastrati, Zenun
    Abdullah
    Batra, Rakhi
    Wani, Mudasir Ahmad
    [J]. IEEE ACCESS, 2021, 9 : 124478 - 124490