A Survey of Orthographic Information in Machine Translation

被引:4
|
作者
Chakravarthi B.R. [1 ]
Rani P. [1 ]
Arcan M. [2 ]
McCrae J.P. [1 ]
机构
[1] Unit for Linguistic Data, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
[2] Unit for Natural Language Processing, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
基金
欧盟地平线“2020”; 爱尔兰科学基金会;
关键词
Machine translation; Neural machine translation; Orthography; Rule-based machine translation; Statistical machine translation; Under-resourced languages;
D O I
10.1007/s42979-021-00723-4
中图分类号
学科分类号
摘要
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [41] Multimodal Machine Translation with Fusion of Generated Visual Information
    Yuan, Jiaqi
    Shi, Xiayang
    Niu, Yue
    Niu, Yufeng
    Wang, Xuhui
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 150 - 156
  • [42] Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation
    Wang, Xing
    Tu, Zhaopeng
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2255 - 2266
  • [43] Reduction of Neural Machine Translation Failures by Incorporating Statistical Machine Translation
    Dugonik, Jani
    Maucec, Mirjam Sepesy
    Verber, Domen
    Brest, Janez
    MATHEMATICS, 2023, 11 (11)
  • [44] A review of Thai-English machine translation
    Lyons, Seamus
    MACHINE TRANSLATION, 2020, 34 (2-3) : 197 - 230
  • [45] Progress in Machine Translation
    Wang, Haifeng
    Wu, Hua
    He, Zhongjun
    Huang, Liang
    Church, Kenneth Ward
    ENGINEERING, 2022, 18 : 143 - 153
  • [46] Survey of data-selection methods in statistical machine translation
    Eetemadi, Sauleh
    Lewis, William
    Toutanova, Kristina
    Radha, Hayder
    MACHINE TRANSLATION, 2015, 29 (3-4) : 189 - 223
  • [47] Neural Machine Translation for Low-resource Languages: A Survey
    Ranathunga, Surangika
    Lee, En-Shiun Annie
    Skenduli, Marjana Prifti
    Shekhar, Ravi
    Alam, Mehreen
    Kaur, Rishemjit
    ACM COMPUTING SURVEYS, 2023, 55 (11)
  • [48] Multimodal Machine Translation Approaches for Indian Languages: A Comprehensive Survey
    Paul, Binnu
    Rudrapal, Dwijen
    Chakma, Kunal
    Jamatia, Anupam
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (05) : 694 - 717
  • [49] A Survey of Machine Translation and Parts of Speech Tagging for Indian Languages
    Khedkar, Vijayshri
    Shah, Pritesh
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 245 - 253
  • [50] Improving Neural Machine Translation Using Rule-Based Machine Translation
    Singh, Muskaan
    Kumar, Ravinder
    Chana, Inderveer
    2019 7TH INTERNATIONAL CONFERENCE ON SMART COMPUTING & COMMUNICATIONS (ICSCC), 2019, : 8 - 12