Many-to-Many Multilingual Translation Model for Languages of Indonesia

被引:1
|
作者
Wongso, Wilson [1 ]
Joyoadikusumo, Ananto [1 ]
Buana, Brandon Scott [1 ]
Suhartono, Derwin [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
关键词
Languages of Indonesia; low-resource languages; mT5; natural language processing; neural machine translation; LOW-RESOURCE;
D O I
10.1109/ACCESS.2023.3308818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Indonesia is home to over 700 languages and most people speak their respective regional languages aside from the lingua franca. In this paper, we focus on the task of multilingual machine translation for 45 regional Indonesian languages and introduced Indo-T5 which leveraged the mT5 sequence-to-sequence language model as a baseline. Performances of bilingual and multilingual fine-tuning methods were also compared, in which we found that our models have outperformed current state-of-the-art translation models. We also investigate the use of religious texts from the Bible as an intermediate mid-resource translation domain for low-resource translation domain specialization. Our findings suggest that this two-step fine-tuning approach is highly effective in improving the quality of translations for low-resource text domains. Our results show an increase in SacreBLEU scores when evaluated on the low-resource NusaX dataset. We release our translation models for other researchers to leverage.
引用
收藏
页码:91385 / 91397
页数:13
相关论文
共 50 条
  • [21] A many-to-many 'rural hospital theorem'
    Klijn, Flip
    Yazici, Ayse
    JOURNAL OF MATHEMATICAL ECONOMICS, 2014, 54 : 63 - 73
  • [22] Conveyors for Streaming Many-To-Many Communication
    Maley, F. Miller
    DeVinney, Jason G.
    2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 1 - 8
  • [23] Many-to-many information flow policies
    Baldan, Paolo
    Lafuente, Alberto Lluch
    SCIENCE OF COMPUTER PROGRAMMING, 2018, 168 : 118 - 141
  • [24] Many-to-Many Voice Transformer Network
    Kameoka, Hirokazu
    Huang, Wen-Chin
    Tanaka, Kou
    Kaneko, Takuhiro
    Hojo, Nobukatsu
    Toda, Tomoki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 656 - 670
  • [25] Many-to-Many Communication in Radio Networks
    Chlebus, Bogdan S.
    Kowalski, Dariusz R.
    Radzik, Tomasz
    ALGORITHMICA, 2009, 54 (01) : 118 - 139
  • [26] Many-to-Many Information Flow Policies
    Baldan, Paolo
    Beggiato, Alessandro
    Lafuente, Alberto Lluch
    COORDINATION MODELS AND LANGUAGES, COORDINATION 2017, 2017, 10319 : 159 - 177
  • [27] Visualizing many-to-many association rules
    Yang, L
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 92 - 95
  • [28] An optimization model and a solution algorithm for the many-to-many car pooling problem
    Shangyao Yan
    Chun-Ying Chen
    Annals of Operations Research, 2011, 191 : 37 - 71
  • [29] Many-to-Many Communication in Radio Networks
    Bogdan S. Chlebus
    Dariusz R. Kowalski
    Tomasz Radzik
    Algorithmica, 2009, 54 : 118 - 139
  • [30] Stable many-to-many matchings with contracts
    Klaus, Bettina
    Walzl, Markus
    JOURNAL OF MATHEMATICAL ECONOMICS, 2009, 45 (7-8) : 422 - 434