Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation

被引:7
|
作者
Mao, Zhuoyuan [1 ]
Chu, Chenhui [1 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
Low-resource neural machine translation; pre-training; linguistically-driven;
D O I
10.1145/3491065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the present study, we propose novel sequence-to-sequence pre-training objectives for low-resource machine translation (NMT): Japanese-specific sequence to sequence (JASS) for language pairs involving Japanese as the source or target language, and English-specific sequence to sequence (ENSS) for language pairs involving English. JASS focuses on masking and reordering Japanese linguistic units known as bunsetsu, whereas ENSS is proposed based on phrase structure masking and reordering tasks. Experiments on ASPEC Japanese-English & Japanese-Chinese, Wikipedia Japanese-Chinese, News English-Korean corpora demonstrate that JASS and ENSS outperform MASS and other existing language-agnostic pre-training methods by up to +2.9 BLEU points for the Japanese-English tasks, up to +7.0 BLEU points for the Japanese-Chinese tasks and up to +1.3 BLEU points for English-Korean tasks. Empirical analysis, which focuses on the relationship between individual parts in JASS and ENSS, reveals the complementary nature of the subtasks of JASS and ENSS. Adequacy evaluation using LASER, human evaluation, and case studies reveals that our proposed methods significantly outperform pre-training methods without injected linguistic knowledge and they have a larger positive impact on the adequacy as compared to the fluency.
引用
收藏
页数:29
相关论文
共 25 条
  • [1] Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2021, 12 (03)
  • [2] Low-Resource Neural Machine Translation Using XLNet Pre-training Model
    Wu, Nier
    Hou, Hongxu
    Guo, Ziyue
    Zheng, Wei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 503 - 514
  • [3] Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training
    Cao, Yichao
    Li, Miao
    Feng, Tao
    Wang, Rujing
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 321 - 333
  • [4] Pre-training model for low-resource Chinese-Braille translation
    Yu, Hailong
    Su, Wei
    Liu, Lei
    Zhang, Jing
    Cai, Chuan
    Xu, Cunlu
    DISPLAYS, 2023, 79
  • [5] Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    Luo, Gongxu
    FUTURE INTERNET, 2020, 12 (12): : 1 - 13
  • [6] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [7] Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
    Gupta, Kamal Kumar
    Sen, Sukanta
    Haque, Rejwanul
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    MACHINE TRANSLATION, 2021,
  • [8] Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
    Gupta, Kamal Kumar
    Sen, Sukanta
    Haque, Rejwanul
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (04) : 661 - 685
  • [9] Semantic Perception-Oriented Low-Resource Neural Machine Translation
    Wu, Nier
    Hou, Hongxu
    Li, Haoran
    Chang, Xin
    Jia, Xiaoning
    MACHINE TRANSLATION, CCMT 2021, 2021, 1464 : 51 - 62
  • [10] Understanding and Improving Low-Resource Neural Machine Translation with Shallow Features
    Sun, Yanming
    Liu, Xuebo
    Wong, Derek F.
    Lin, Yuchu
    Li, Bei
    Zhan, Runzhe
    Chao, Lidia S.
    Zhang, Min
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 227 - 239