Many-to-Many Multilingual Translation Model for Languages of Indonesia

被引:1
|
作者
Wongso, Wilson [1 ]
Joyoadikusumo, Ananto [1 ]
Buana, Brandon Scott [1 ]
Suhartono, Derwin [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
关键词
Languages of Indonesia; low-resource languages; mT5; natural language processing; neural machine translation; LOW-RESOURCE;
D O I
10.1109/ACCESS.2023.3308818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Indonesia is home to over 700 languages and most people speak their respective regional languages aside from the lingua franca. In this paper, we focus on the task of multilingual machine translation for 45 regional Indonesian languages and introduced Indo-T5 which leveraged the mT5 sequence-to-sequence language model as a baseline. Performances of bilingual and multilingual fine-tuning methods were also compared, in which we found that our models have outperformed current state-of-the-art translation models. We also investigate the use of religious texts from the Bible as an intermediate mid-resource translation domain for low-resource translation domain specialization. Our findings suggest that this two-step fine-tuning approach is highly effective in improving the quality of translations for low-resource text domains. Our results show an increase in SacreBLEU scores when evaluated on the low-resource NusaX dataset. We release our translation models for other researchers to leverage.
引用
收藏
页码:91385 / 91397
页数:13
相关论文
共 50 条
  • [41] Many-to-Many Relational Parallel Coordinates Displays
    Lind, Mats
    Johansson, Jimmy
    Cooper, Matthew
    INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 25 - +
  • [42] Object recognition as many-to-many feature matching
    Demirci, M. Fatih
    Shokoufandeh, Ali
    Keselman, Yakov
    Bretzner, Lars
    Dickinson, Sven
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2006, 69 (02) : 203 - 222
  • [43] Multicast congestion control on many-to-many videoconferencing
    Zhang, Xuan
    Li, Chongrong
    Li, Xing
    Bao, Congxiao
    FGCN: PROCEEDINGS OF THE 2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING, VOLS 1 AND 2, 2008, : 258 - 261
  • [44] XimSwap: Many-to-Many Face Swapping for TinyML
    Ancilotto, Alberto
    Paissan, Francesco
    Farella, Elisabetta
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (03) : 1 - 16
  • [45] A Lagrangian bound for many-to-many assignment problems
    Litvinchev, Igor
    Rangel, Socorro
    Saucedo, Jania
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2010, 19 (03) : 241 - 257
  • [46] Object Recognition as Many-to-Many Feature Matching
    M. Fatih Demirci
    Ali Shokoufandeh
    Yakov Keselman
    Lars Bretzner
    Sven Dickinson
    International Journal of Computer Vision, 2006, 69 : 203 - 222
  • [47] Diffusion: Analysis of Many-to-Many Transactions in Bitcoin
    Eck, Dylan
    Torek, Adam
    Cutchin, Steven
    Dagher, Gaby G.
    2021 IEEE INTERNATIONAL CONFERENCE ON BLOCKCHAIN (BLOCKCHAIN 2021), 2021, : 388 - 393
  • [48] Many-to-Many Matching for Combinatorial Spectrum Trading
    Jiang, Linshan
    Cai, Haofan
    Chen, Yanjiao
    Zhang, Jin
    Li, Baochun
    2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [49] Contract design and stability in many-to-many matching
    Hatfield, John William
    Kominers, Scott Duke
    GAMES AND ECONOMIC BEHAVIOR, 2017, 101 : 78 - 97
  • [50] C++CSP2: A Many-to-Many Threading Model for Multicore Architectures
    Brown, Neil
    WOTUG-30: COMMUNICATING PROCESS ARCHITECTURES 2007, 2007, 65 : 183 - 205