A Survey of Orthographic Information in Machine Translation

被引:4
|
作者
Chakravarthi B.R. [1 ]
Rani P. [1 ]
Arcan M. [2 ]
McCrae J.P. [1 ]
机构
[1] Unit for Linguistic Data, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
[2] Unit for Natural Language Processing, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
基金
欧盟地平线“2020”; 爱尔兰科学基金会;
关键词
Machine translation; Neural machine translation; Orthography; Rule-based machine translation; Statistical machine translation; Under-resourced languages;
D O I
10.1007/s42979-021-00723-4
中图分类号
学科分类号
摘要
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [31] Transliteration normalization for Information Extraction and Machine Translation
    Marton, Yuval
    Zitouni, Imed
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 379 - 387
  • [32] Machine translation literacy in the legal translation context: a SWOT analysis perspective
    Killman, Jeffrey
    INTERPRETER AND TRANSLATOR TRAINER, 2024, 18 (02) : 271 - 289
  • [33] Arabic Machine Translation: A survey of the latest trends and challenges
    Ameur, Mohamed Seghir Hadj
    Meziane, Farid
    Guessoum, Ahmed
    COMPUTER SCIENCE REVIEW, 2020, 38
  • [34] Machine translation status of Indian scheduled languages: A survey
    Lone N.A.
    Giri K.J.
    Bashir R.
    Multimedia Tools and Applications, 2023, 82 (29) : 45145 - 45173
  • [35] A Survey of Machine Translation Techniques and Systems for Indian Languages
    Saini, Sandeep
    Sahula, Vineet
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION TECHNOLOGY CICT 2015, 2015, : 676 - 681
  • [36] A Survey of Non-Autoregressive Neural Machine Translation
    Li, Feng
    Chen, Jingxian
    Zhang, Xuejun
    ELECTRONICS, 2023, 12 (13)
  • [37] Machine Translation using Semantic Web Technologies: A Survey
    Moussallem, Diego
    Wauer, Matthias
    Ngomo, Axel-Cyrille Ngonga
    JOURNAL OF WEB SEMANTICS, 2018, 51 : 1 - 19
  • [38] A Survey on Low-resource Neural Machine Translation
    Li H.-Z.
    Feng C.
    Huang H.-Y.
    Huang, He-Yan (hhy63@bit.edu.cn), 1600, Science Press (47): : 1217 - 1231
  • [39] A survey of context in neural machine translation and its evaluation
    Castilho, Sheila
    Knowles, Rebecca
    NATURAL LANGUAGE PROCESSING, 2024,
  • [40] Neural Machine Translation by Fusing Key Information of Text
    Hu, Shijie
    Li, Xiaoyu
    Bai, Jiayu
    Lei, Hang
    Qian, Weizhong
    Hu, Sunqiang
    Zhang, Cong
    Kofi, Akpatsa Samuel
    Qiu, Qian
    Zhou, Yong
    Yang, Shan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 2803 - 2815