Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability

被引:0
|
作者
Huang, Kaiyu [1 ]
Liu, Junpeng [1 ]
Huang, Degen [1 ]
Xiong, Deyi [2 ,3 ]
Liu, Zhuang [4 ]
Su, Jinsong [5 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
[2] Tianjin Univ, Tianjin, Peoples R China
[3] Global Tone Commun Technol Co Ltd, Beijing, Peoples R China
[4] Dongbei Univ Finance & Econ, Dalian, Peoples R China
[5] Xiamen Univ, Xiamen, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models (e.g., BERT) significantly alleviate two traditional challenging problems for Chinese word segmentation (CWS): segmentation ambiguity and out-of-vocabulary (OOV) words. However, such improvements are usually achieved on traditional benchmark datasets and not close to an important goal of CWS: practicability (i.e., low complexity as a standalone task and high beneficiality to downstream tasks). To make a trade-off between traditional evaluation and practicability for CWS, we propose a semisupervised neural method via pseudo labels. The neural method consists of a teacher model and a student model, which distills knowledge from unlabeled data to the student model so as to improve both in-domain and out-of-domain CWS. Experiments show that our proposed method can not only keep the practicability of the lightweight student model but also improve the performance of segmentation effectively. We also evaluate a range of heterogeneous neural architectures of CWS on downstream Chinese NLP tasks. Results of further experiments demonstrate that our proposed segmenter is reliable and practical as a pre-processing step of the downstream NLP tasks at the minimum cost.(1)
引用
收藏
页码:4369 / 4381
页数:13
相关论文
共 50 条
  • [21] A Hybrid Approach to Chinese Word Segmentation
    Chen, Bing
    Tai, Xiaoying
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 154 - 158
  • [22] Research and implementation on Chinese word segmentation
    Tang, Yunting
    Wu, Yan
    2007 International Symposium on Computer Science & Technology, Proceedings, 2007, : 361 - 363
  • [23] CRFs based Chinese word segmentation
    Gui, Kunzhi
    Ren, Yong
    Peng, Zhaomeng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 4376 - 4379
  • [24] An integrated approach for Chinese word segmentation
    Fu, GH
    Luke, KK
    PACLIC 17: LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2003, : 80 - 87
  • [25] Chinese Word Segmentation with Character Abstraction
    Tian, Le
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 36 - 43
  • [26] Neural Word Segmentation Learning for Chinese
    Cai, Deng
    Zhao, Hai
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 409 - 420
  • [27] A corpus of Chinese word segmentation agreement
    Tsang, Yiu-Kei
    Yan, Ming
    Pan, Jinger
    Chan, Megan Yin Kan
    BEHAVIOR RESEARCH METHODS, 2024, 57 (01)
  • [28] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 603 - 632
  • [29] A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
    Han, Aaron Li-Feng
    Wong, Derek F.
    Chao, Lidia S.
    He, Liangye
    Zhu, Ling
    Li, Shuo
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 111 - 118
  • [30] A Word Segmentation Method of Ancient Chinese Based on Word Alignment
    Che, Chao
    Zhao, Hanyu
    Wu, Xiaoting
    Zhou, Dongsheng
    Zhang, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 761 - 772