Improving Multilingual Neural Machine Translation System for Indic Languages

被引:4
|
作者
Das, Sudhansu Bala [1 ]
Biradar, Atharv [2 ]
Mishra, Tapas Kumar [1 ]
Patra, Bidyut Kr. [3 ]
机构
[1] Natl Inst Technol NIT, Rourkela 769008, Odisha, India
[2] Pune Inst Comp Technol PICT, Pune, Maharashtra, India
[3] Indian Inst Technol IIT, Varanasi, Uttar Pradesh, India
关键词
Multilingual neuralmachine translation system (MNMT); Indic languages (ILs); low resource language; corpus; BLEU score;
D O I
10.1145/3587932
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. However, NMT systems are restricted in translating low-resource languages as a huge quantity of data is required to learn useful mappings across languages. The need for an efficient translation system becomes obvious in a large multilingual environment like India. Indian languages (ILs) are still treated as low-resource languages due to unavailability of corpora. In order to address such an asymmetric nature, the multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. The MNMT converts many languages using a single model, which is extremely useful in terms of training process and lowering online maintenance costs. It is also helpful for improving low-resource translation. In this article, we propose an MNMT system to address the issues related to low-resource language translation. Our model comprises two MNMT systems, i.e., for English-Indic (one-to-many) and for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have a scanty amount of parallel corpora, not sufficient for training any machine translation model, we explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. In addition, the article addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of back-translation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics, i.e., BLEU (BiLingual Evaluation Understudy) score for a set of ILs.
引用
收藏
页数:24
相关论文
共 45 条
  • [1] Multilingual neural machine translation for low-resource languages by twinning important nodes
    Qorbani, Abouzar
    Ramezani, Reza
    Baraani, Ahmad
    Kazemi, Arefeh
    NEUROCOMPUTING, 2025, 634
  • [2] Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages
    Saxena, Shefali
    Chaurasia, Uttkarsh
    Bansal, Nitin
    Daniel, Philemon
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8848 - 8858
  • [3] Comparative structure of Homoeopathy language with other medical languages in Machine Translation System
    Dwivedi, Sanjay K.
    Sukhadeve, Pramod P.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 775 - 778
  • [4] Multilingual Machine Translation : An Analytical Study
    Phadke, Madhura Mandar
    Devane, Satish R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 881 - 884
  • [5] Chinese-English-Burmese neural machine translation based on multilingual joint training
    Man Z.
    Mao C.
    Yu Z.
    Li X.
    Gao S.
    Zhu J.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2021, 61 (09): : 927 - 935
  • [6] Improving machine translation accuracy for underrepresented languages in linguistic research using transformer models
    Liu, Yuanyuan
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2025,
  • [7] Corpus based Machine Translation System with Deep Neural Network for Sanskrit to Hindi Translation
    Singh, Muskaan
    Kumar, Ravinder
    Chana, Inderveer
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2534 - 2544
  • [8] Machine translation status of Indian scheduled languages: A survey
    Lone N.A.
    Giri K.J.
    Bashir R.
    Multimedia Tools and Applications, 2023, 82 (29) : 45145 - 45173
  • [9] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [10] Neural machine translation for Tamil to English
    Jain, Minni
    Punia, Ravneet
    Hooda, Ishika
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2020, 23 (07) : 1251 - 1264