A Survey of Orthographic Information in Machine Translation

被引:4
|
作者
Chakravarthi B.R. [1 ]
Rani P. [1 ]
Arcan M. [2 ]
McCrae J.P. [1 ]
机构
[1] Unit for Linguistic Data, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
[2] Unit for Natural Language Processing, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
基金
欧盟地平线“2020”; 爱尔兰科学基金会;
关键词
Machine translation; Neural machine translation; Orthography; Rule-based machine translation; Statistical machine translation; Under-resourced languages;
D O I
10.1007/s42979-021-00723-4
中图分类号
学科分类号
摘要
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [1] A Survey on Machine Translation of Low-Resource Arabic Dialects
    Abdul-Nabi, Razan
    Obeidat, Rasha
    Bsoul, Anas
    2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
  • [2] A Survey of Neural Machine Translation
    Li Y.-C.
    Xiong D.-Y.
    Zhang M.
    Zhang, Min (minzhang@suda.edu.cn), 2018, Science Press (41): : 2734 - 2755
  • [3] Survey on Neural Machine Translation for multilingual translation system
    Basmatkar, Pranjali
    Holani, Hemant
    Kaushal, Shivani
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 443 - 448
  • [4] Empirical survey of Machine Translation Tools
    Chand, Sunita
    2016 SECOND IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2016, : 181 - 185
  • [5] Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation
    Bergmanis, Toms
    Stafanovics, Arturs
    Pinnis, Marcis
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 80 - 86
  • [6] Orthographic and morphological processing for English-Arabic statistical machine translation
    El Kholy, Ahmed
    Habash, Nizar
    MACHINE TRANSLATION, 2012, 26 (1-2) : 25 - 45
  • [7] Text-Text Neural Machine Translation: A Survey
    G. R. Ebisa Gemechu
    Optical Memory and Neural Networks, 2023, 32 : 59 - 72
  • [8] Arabic Machine Translation: A Survey With Challenges and Future Directions
    Zakraoui, Jezia
    Saleh, Moutaz
    Al-Maadeed, Somaya
    Alja'am, Jihad Mohamed
    IEEE ACCESS, 2021, 9 : 161445 - 161468
  • [9] Text-Text Neural Machine Translation: A Survey
    Gemechu, Ebisa
    Kanagachidambaresan, G. R.
    OPTICAL MEMORY AND NEURAL NETWORKS, 2023, 32 (02) : 59 - 72
  • [10] A Survey on Evaluation Metrics for Machine Translation
    Lee, Seungjun
    Lee, Jungseob
    Moon, Hyeonseok
    Park, Chanjun
    Seo, Jaehyung
    Eo, Sugyeong
    Koo, Seonmin
    Lim, Heuiseok
    MATHEMATICS, 2023, 11 (04)