Many-to-Many Multilingual Translation Model for Languages of Indonesia

被引:1
|
作者
Wongso, Wilson [1 ]
Joyoadikusumo, Ananto [1 ]
Buana, Brandon Scott [1 ]
Suhartono, Derwin [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
关键词
Languages of Indonesia; low-resource languages; mT5; natural language processing; neural machine translation; LOW-RESOURCE;
D O I
10.1109/ACCESS.2023.3308818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Indonesia is home to over 700 languages and most people speak their respective regional languages aside from the lingua franca. In this paper, we focus on the task of multilingual machine translation for 45 regional Indonesian languages and introduced Indo-T5 which leveraged the mT5 sequence-to-sequence language model as a baseline. Performances of bilingual and multilingual fine-tuning methods were also compared, in which we found that our models have outperformed current state-of-the-art translation models. We also investigate the use of religious texts from the Bible as an intermediate mid-resource translation domain for low-resource translation domain specialization. Our findings suggest that this two-step fine-tuning approach is highly effective in improving the quality of translations for low-resource text domains. Our results show an increase in SacreBLEU scores when evaluated on the low-resource NusaX dataset. We release our translation models for other researchers to leverage.
引用
收藏
页码:91385 / 91397
页数:13
相关论文
共 50 条
  • [1] Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
    Pan, Xiao
    Wang, Mingxuan
    Wu, Liwei
    Li, Lei
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 244 - 258
  • [2] Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation
    Kim, Minsu
    Choi, Jeongsoo
    Kim, Dahun
    Ro, Yong Man
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3934 - 3946
  • [3] Evaluation of Correctness in Unsupervised Many-to-Many Image Translation
    Bashkirova, Dina
    Usman, Ben
    Saenko, Kate
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1 - 10
  • [4] Tractable Model of Dynamic Many-to-Many Matching
    Peski, Marcin
    AMERICAN ECONOMIC JOURNAL-MICROECONOMICS, 2022, 14 (02) : 1 - 43
  • [5] Model Transformation By-Example: An Algorithm for Generating Many-to-Many Transformation Rules in Several Model Transformation Languages
    Garcia-Magarino, Ivan
    Gomez-Sanz, Jorge J.
    Fuentes-Fernandez, Ruben
    THEORY AND PRACTICE OF MODEL TRANSFORMATIONS, 2009, 5563 : 52 - +
  • [6] Metanode Composition Method for Multilingual Parallel-text Having Many-to-many Relationship
    Fukushima, Taku
    Yoshino, Takashi
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 500 - 508
  • [7] Toward learning a unified many-to-many mapping for diverse image translation
    Xu, Wenju
    Shawn, Keshmiri
    Wang, Guanghui
    PATTERN RECOGNITION, 2019, 93 : 570 - 580
  • [8] Multilingual Learners in the Early Years. Many Languages - Many Children
    Donaldson, Sheonagh
    CHILD LANGUAGE TEACHING & THERAPY, 2008, 24 (03): : 360 - 362
  • [9] Many-to-Many Pair Trading
    Wang, Yingying
    Li, Xiaodong
    Wu, Pangjing
    Xie, Haoran
    WEB AND BIG DATA, PT I, APWEB-WAIM 2022, 2023, 13421 : 399 - 407
  • [10] IMPERFECT MANY-TO-MANY TELEPORTATION
    Ghiu, Iulia
    Isdraila, Tudor
    Suciu, Serban
    ROMANIAN JOURNAL OF PHYSICS, 2012, 57 (3-4): : 564 - 570