INFOXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

被引:0
|
作者
Chi, Zewen [1 ,2 ]
Dong, Li [2 ]
Wei, Furu [2 ]
Yang, Nan [2 ]
Singhal, Saksham [2 ]
Wang, Wenhui [2 ]
Song, Xia [2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
Zhou, Ming [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
来源
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021) | 2021年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pretraining task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https: //aka.ms/infoxim.
引用
收藏
页码:3576 / 3588
页数:13
相关论文
共 50 条
  • [41] Cross-lingual training of summarization systems using annotated corpora in a foreign language
    Litvak, Marina
    Last, Mark
    INFORMATION RETRIEVAL, 2013, 16 (05): : 629 - 656
  • [42] Cross-lingual training of summarization systems using annotated corpora in a foreign language
    Marina Litvak
    Mark Last
    Information Retrieval, 2013, 16 : 629 - 656
  • [43] Multilingual mixture attention interaction framework with adversarial training for cross-lingual SLU
    Zhang, Qichen
    Wang, Shuai
    Li, Jingmei
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (04): : 1915 - 1930
  • [44] Multilingual mixture attention interaction framework with adversarial training for cross-lingual SLU
    Qichen Zhang
    Shuai Wang
    Jingmei Li
    Neural Computing and Applications, 2024, 36 : 1915 - 1930
  • [45] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
    Sun, Yu
    Wang, Shuohuan
    Li, Yukun
    Feng, Shikun
    Tian, Hao
    Wu, Hua
    Wang, Haifeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
  • [46] TACKLING THE SCORE SHIFT IN CROSS-LINGUAL SPEAKER VERIFICATION BY EXPLOITING LANGUAGE INFORMATION
    Thienpondt, Jenthe
    Desplanques, Brecht
    Demuynck, Kris
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7187 - 7191
  • [47] FlauBERT: Unsupervised Language Model Pre-training for French
    Le, Hang
    Vial, Loic
    Frej, Jibril
    Segonne, Vincent
    Coavoux, Maximin
    Lecouteux, Benjamin
    Allauzen, Alexandre
    Crabbe, Benoit
    Besacier, Laurent
    Schwab, Didier
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2479 - 2490
  • [48] Soft Language Clustering for Multilingual Model Pre-training
    Zeng, Jiali
    Jiang, Yufan
    Yin, Yongjing
    Jing, Yi
    Meng, Fandong
    Lin, Binghuai
    Cao, Yunbo
    Zhou, Jie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7021 - 7035
  • [49] Cross-lingual information retrieval model based on bilingual topic correlation
    Luo, Yuansheng
    Le, Zhongjian
    Wang, Mingwen
    Journal of Computational Information Systems, 2013, 9 (06): : 2433 - 2440
  • [50] An end-to-end model for cross-lingual transformation of paralinguistic information
    Kano, Takatomo
    Takamichi, Shinnosuke
    Sakti, Sakriani
    Neubig, Graham
    Toda, Tomoki
    Nakamura, Satoshi
    MACHINE TRANSLATION, 2018, 32 (04) : 353 - 368