Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

被引:0
|
作者
Liang, Xinnian [1 ]
Zhou, Zefan [2 ]
Huang, Hui [3 ]
Wu, Shuangzhi [4 ]
Xiao, Tong [2 ]
Yang, Muyun [3 ]
Li, Zhoujun [1 ]
Bian, Chao [4 ]
机构
[1] State Key Lab of Software Development Environment, Beihang University, Beijing, China
[2] School of Computer Science and Engineering, Northeastern University, Shenyang, China
[3] Faculty of Computing, Harbin Institute of Technology, Harbin, China
[4] Lark Platform Engineering-AI, Beijing, China
来源
arXiv | 2023年
关键词
Engineering Village;
D O I
暂无
中图分类号
学科分类号
摘要
Character level - Design objectives - Language model - Objective functions - Performance - Word level
引用
收藏
相关论文
共 50 条
  • [1] Revisiting Pre-trained Models for Chinese Natural Language Processing
    Cui, Yiming
    Che, Wanxiang
    Liu, Ting
    Qin, Bing
    Wang, Shijin
    Hu, Guoping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 657 - 668
  • [2] Enhancing pre-trained language models with Chinese character morphological knowledge
    Zheng, Zhenzhong
    Wu, Xiaoming
    Liu, Xiangzhi
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [3] Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models
    Li, Wenbiao
    Sun, Rui
    Wu, Yunfang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 3 - 15
  • [4] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 603 - 632
  • [5] Impact of Morphological Segmentation on Pre-trained Language Models
    Westhelle, Matheus
    Bencke, Luciana
    Moreira, Viviane P.
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 402 - 416
  • [6] Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models
    Lai, Yuxuan
    Liu, Yijia
    Feng, Yansong
    Huang, Songfang
    Zhao, Dongyan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1716 - 1731
  • [7] Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
    Ghaddar, Abbas
    Wu, Yimeng
    Bagga, Sunyam
    Rashid, Ahmad
    Bibi, Khalil
    Rezagholizadeh, Mehdi
    Xing, Chao
    Wang, Yasheng
    Xinyu, Duan
    Wang, Zhefeng
    Huai, Baoxing
    Jiang, Xin
    Liu, Qun
    Langlais, Philippe
    arXiv, 2022,
  • [8] Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Processing
    Huawei Technologies Co., Ltd.
    不详
    不详
    Proc. Conf. Empir. Methods Nat. Lang. Process., EMNLP, (3135-3151):
  • [9] CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
    He, Xinyu
    Hao, Fengrui
    Gu, Tianlong
    Chang, Liang
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (03)
  • [10] Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention
    Li, Yanzeng
    Yu, Bowen
    Xue, Mengge
    Liu, Tingwen
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3442 - 3448