A Survey of Orthographic Information in Machine Translation

被引:4
|
作者
Chakravarthi B.R. [1 ]
Rani P. [1 ]
Arcan M. [2 ]
McCrae J.P. [1 ]
机构
[1] Unit for Linguistic Data, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
[2] Unit for Natural Language Processing, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway
基金
欧盟地平线“2020”; 爱尔兰科学基金会;
关键词
Machine translation; Neural machine translation; Orthography; Rule-based machine translation; Statistical machine translation; Under-resourced languages;
D O I
10.1007/s42979-021-00723-4
中图分类号
学科分类号
摘要
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [11] Machine translation for Arabic dialects (survey)
    Harrat, Salima
    Meftouh, Karima
    Smaili, Kamel
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (02) : 262 - 273
  • [12] A Survey of Multilingual Neural Machine Translation
    Dabre, Raj
    Chu, Chenhui
    Kunchukuttan, Anoop
    ACM COMPUTING SURVEYS, 2020, 53 (05)
  • [13] A Survey of Research and Application of NLP-based Machine Translation
    Liu, Youyao
    Ma, Yuechi
    Zhou, Sicong
    Luo, Xun
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 315 - 319
  • [14] A comprehensive survey on machine translation for English, Hindi and Sanskrit languages
    Sitender
    Bawa, Seema
    Kumar, Munish
    Sangeeta
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (4) : 3441 - 3474
  • [15] Statistical Machine Translation Enhancements through Linguistic Levels: A Survey
    Costa-Jussa, Marta R.
    Farrus, Mireia
    ACM COMPUTING SURVEYS, 2014, 46 (03)
  • [16] A comprehensive survey on machine translation for English, Hindi and Sanskrit languages
    Seema Sitender
    Munish Bawa
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 3441 - 3474
  • [17] Machine translation and fair access to information
    Nurminen, Mary
    Koponen, Maarit
    TRANSLATION SPACES, 2020, 9 (01) : 150 - 169
  • [18] Machine Translation and Disclosure of Patent Information
    Larroyed, Aline A.
    IIC-INTERNATIONAL REVIEW OF INTELLECTUAL PROPERTY AND COMPETITION LAW, 2018, 49 (07) : 763 - 786
  • [19] A critique of Statistical Machine Translation
    Way, Andy
    LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 2009, 8 : 17 - 41
  • [20] A Survey of Machine Translation Tasks on Nigerian Languages
    Nwafor, Ebelechukwu
    Andy, Anietie
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6480 - 6486