On the Pareto Front of Multilingual Neural Machine Translation

被引:0
作者
Chen, Liang [1 ]
Ma, Shuming [2 ]
Zhang, Dongdong [2 ]
Wei, Furu [2 ]
Chang, Baobao [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
[2] Microsoft Res, Redmond, WA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we study how the performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, data sizes, and language directions, we find it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. Accordingly, scalarization method leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget. We release the code at https://github.com/pkunlp-icler/ParetoMNMT for reproduction.
引用
收藏
页数:14
相关论文
共 31 条
  • [1] Aharoni R, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3874
  • [2] Barrault L, 2019, FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), P1
  • [3] Boyd S. P ..., 2004, IEEE Transactions on Automatic Control, V51, P1859
  • [4] Callison-Burch Chris, 2010, P JOINT 5 WORKSHOP S
  • [5] Costa-jussa Cross cCelebi Elbayad Heafield Heffernan Kalbassi Lam Licht Maillard Sun Wang Wenzek Youngblood Akula Barrault Hansanti Hoffman Jarrett Sadagopan Rowe Spruit Gonzalez Marta Ruiz James Onur Maha Kenneth Kevin Elahe Janice Daniel Jean Anna Skyler Guillaume Alison Bapi Loic Prangthip John Semarley Kaushik Ram Dirk Shannon L. Gabriel Mejia, 2022, arXiv
  • [6] Dong Daxiang, 2015, P 53 ANN M ASS COMP, V1, P1723
  • [7] Fernandes Patrick, 2023, arXiv
  • [8] Foret Pierre, 2020, ARXIV
  • [9] Ghorbani B., 2021, ARXIV
  • [10] Ghorbani Behrooz, 2021, ARXIV