Enhancing Chinese Word Segmentation with Character Clustering

被引:0
|
作者
Liu, Yijia [1 ]
Che, Wanxiang [1 ]
Liu, Ting [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Res Ctr Social Comp & Informat Retrieval, Harbin, Peoples R China
来源
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA | 2013年 / 8208卷
关键词
Brown clustering; Chinese word segmentation; semi-supervised learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In semi-supervised learning framework, clustering has been proved a helpful feature to improve system performance in NER and other NLP tasks. However, there hasn't been any work that employs clustering in word segmentation. In this paper, we proposed a new approach to compute clusters of characters and use these results to assist a character based Chinese word segmentation system. Contextual information is considered when we perform character clustering algorithm to address character ambiguity. Experiments show our character clusters result in performance improvement. Also, we compare our clusters features with widely used mutual information (MI). When two features integrated, further improvement is achieved.
引用
收藏
页码:52 / 60
页数:9
相关论文
共 50 条
  • [1] Chinese Word Segmentation based on Conditional Random Fields with Character Clustering
    Du, Liping
    Li, Xiaoge
    Liu, Chunli
    Liu, Rui
    Fan, Xian
    Yang, Jianing
    Lin, Dayi
    Wei, Mian
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 258 - 261
  • [2] Chinese Word Segmentation with Character Abstraction
    Tian, Le
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 36 - 43
  • [3] Burmese Word Segmentation with Character Clustering and CRFs
    Phyu, Myat Lay
    Hashimoto, Kiyota
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [4] Multiple Character Embeddings for Chinese Word Segmentation
    Wang, Jingkang
    Zhou, Jianing
    Zhou, Jie
    Liu, Gongshen
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 210 - 216
  • [5] Which is essential for Chinese word segmentation: Character versus word
    Huang, Chang-Ning
    Zhao, Hai
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 1 - 12
  • [6] Which is essential for Chinese word segmentation: Character versusword
    Microsoft Research Asia, 49, Zhichun Road, Haidian District, Beijing-100080, China
    PACLIC - Proc. Pacific Asia Conf. Lang., Inf. Comput., 2006, (1-12):
  • [7] Federated Chinese Word Segmentation with Global Character Associations
    Tian, Yuanhe
    Chen, Guimin
    Qin, Han
    Song, Yan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4306 - 4313
  • [8] Chinese Word Segmentation for Sub-character Representation
    Zhang, Taozheng
    Shang, Chenyang
    2021 IEEE/ACIS 21ST INTERNATIONAL FALL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2021-FALL), 2021, : 177 - 181
  • [9] Improving Chinese word segmentation with character–lexicon class attention
    Zhongguo Xu
    Yang Xiang
    Neural Computing and Applications, 2025, 37 (5) : 3857 - 3867
  • [10] Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability
    Huang, Kaiyu
    Liu, Junpeng
    Huang, Degen
    Xiong, Deyi
    Liu, Zhuang
    Su, Jinsong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4369 - 4381