JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

被引:0
|
作者
Mao, Zhuoyuan [1 ]
Cromieres, Fabien [1 ]
Dabre, Raj [2 ]
Song, Haiyue [1 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Kyoto, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto, Japan
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
关键词
pre-training; neural machine translation; bunsetsu; low resource;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora. However, they do not account for linguistic information obtained using syntactic analyzers which is known to be invaluable for several Natural Language Processing (NLP) tasks. To this end, we propose JASS, Japanese-specific Sequence to Sequence, as a novel pre-training alternative to MASS for NMT involving Japanese as the source or target language. JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training which focuses on Japanese linguistic units called bunsetsus. In our experiments on ASPEC Japanese-English and News Commentary Japanese-Russian translation we show that JASS can give results that are competitive with if not better than those given by MASS. Furthermore, we show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods indicating their complementary nature. We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks.
引用
收藏
页码:3683 / 3691
页数:9
相关论文
共 50 条
  • [31] Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training
    Xie, Chunzhi
    Lv, Jiancheng
    Li, Xiaojie
    SOFT COMPUTING, 2017, 21 (21) : 6471 - 6479
  • [32] Code-aware fault localization with pre-training and interpretable machine learning
    Zhang, Zhuo
    Li, Ya
    Yang, Sha
    Zhang, Zhanjun
    Lei, Yan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [33] SAR2NDVI: PRE-TRAINING FOR SAR-TO-NDVI IMAGE TRANSLATION
    Kimura, Daiki
    Ishikawa, Tatsuya
    Mitsugi, Masanori
    Kitakoshi, Yasunori
    Tanaka, Takahiro
    Simumba, Naomi
    Tanaka, Kentaro
    Wakabayashi, Hiroaki
    Sampei, Masato
    Tatsubori, Michiaki
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3865 - 3869
  • [34] Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training
    Chunzhi Xie
    Jiancheng Lv
    Xiaojie Li
    Soft Computing, 2017, 21 : 6471 - 6479
  • [35] Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation
    Skurzhanskyi, O. H.
    Marchenko, O. O.
    Anisimov, A. V.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2024, 60 (02) : 167 - 174
  • [36] The Reduction of Fully Connected Neural Network Parameters Using the Pre-training Technique
    Kroshchanka, Aliaksandr
    Golovko, Vladimir
    PROCEEDINGS OF THE 11TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS'2021), VOL 2, 2021, : 937 - 941
  • [37] Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation
    O. H. Skurzhanskyi
    O. O. Marchenko
    A. V. Anisimov
    Cybernetics and Systems Analysis, 2024, 60 : 167 - 174
  • [38] GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks
    Sun, Mingchen
    Zhou, Kaixiong
    He, Xin
    Wang, Ying
    Wang, Xin
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1717 - 1727
  • [39] Roles of pre-training in deep neural networks from information theoretical perspective
    Furusho, Yasutaka
    Kubo, Takatomi
    Ikeda, Kazushi
    NEUROCOMPUTING, 2017, 248 : 76 - 79
  • [40] Ensemble and Pre-Training Approach for Echo State Network and Extreme Learning Machine Models
    Tang, Lingyu
    Wang, Jun
    Wang, Mengyao
    Zhao, Chunyu
    ENTROPY, 2024, 26 (03)