Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

被引:6
|
作者
Li, Wenbiao [1 ,2 ]
Sun, Rui [1 ,2 ]
Wu, Yunfang [1 ,3 ]
机构
[1] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Word semantics; Character representation; Pre-trained models;
D O I
10.1007/978-3-031-17120-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the Chinese pre-trained models adopt characters as basic units for downstream tasks. However, these models ignore the information carried by words and thus lead to the loss of some important semantics. In this paper, we propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models. Specifically, we project a word's embedding into its internal characters' embeddings according to the similarity weight. To strengthen the word boundary information, we mix the representations of the internal characters within a word. After that, we apply a word-tocharacter alignment attention mechanism to emphasize important characters by masking unimportant ones. Moreover, in order to reduce the error propagation caused by word segmentation, we present an ensemble approach to combine segmentation results given by different tokenizers. The experimental results show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks: sentiment classification, sentence pair matching, natural language inference and machine reading comprehension. We make further analysis to prove the effectiveness of each component of our model.
引用
收藏
页码:3 / 15
页数:13
相关论文
共 50 条
  • [41] From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough
    Mars, Mourad
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [42] Chinese Grammatical Error Correction Using Pre-trained Models and Pseudo Data
    Wang, Hongfei
    Kurosawa, Michiki
    Katsumata, Satoru
    Mita, Masato
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [43] Pre-Trained Models Based Receiver Design With Natural Redundancy for Chinese Characters
    Wang, Zhen-Yu
    Yu, Hong-Yi
    Shen, Cai-Yao
    Zhu, Zhao-Rui
    Shen, Zhi-Xiang
    Du, Jian-Ping
    IEEE COMMUNICATIONS LETTERS, 2022, 26 (10) : 2350 - 2354
  • [44] A complex network approach to analyse pre-trained language models for ancient Chinese
    Zheng, Jianyu
    Xiao, Xin'ge
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (05):
  • [45] Exploiting Pre-Trained Network Embeddings for Recommendations in Social Networks
    Lei Guo
    Yu-Fei Wen
    Xin-Hua Wang
    Journal of Computer Science and Technology, 2018, 33 : 682 - 696
  • [46] Exploiting Pre-Trained Network Embeddings for Recommendations in Social Networks
    Guo, Lei
    Wen, Yu-Fei
    Wang, Xin-Hua
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (04) : 682 - 696
  • [47] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [49] Interpreting Art by Leveraging Pre-Trained Models
    Penzel, Niklas
    Denzler, Joachim
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [50] Lottery Jackpots Exist in Pre-Trained Models
    Zhang, Yuxin
    Lin, Mingbao
    Zhong, Yunshan
    Chao, Fei
    Ji, Rongrong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004