An integrated approach for Chinese word segmentation

被引:0
|
作者
Fu, GH [1 ]
Luke, KK [1 ]
机构
[1] Univ Hong Kong, Dept Linguist, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an integrated approach for Chinese word segmentation, which can perform disambiguation and unknown word identification simultaneously on the input. In this work, a hybrid model is used to score known word candidates and unknown word candidates equally by incorporating the modified word-formation models (viz. word-juncture models and word-formation patterns) into word bigram models, with which different types of features are statistically computed and combined for this integrated segmentation, including internal word-formation power of components in a word, affinity relations between these components and the external contextual information. To enhance the precision and avoid the problem of combination explosion in word candidate construction, a filter algorithm is also given to block ineligible unknown word candidates. In this way, ambiguity and unknown word can be resolved effectively. The results of our experiment on Peking University corpus show that the integrated approach outperforms the other two-stage methods under discussion.
引用
收藏
页码:80 / 87
页数:8
相关论文
共 50 条
  • [41] An analysis of key issues in Chinese word segmentation
    Xiu, Chi
    Journal of Computational Information Systems, 2013, 9 (03): : 889 - 896
  • [42] Study on the Influencing Factors of Chinese Word Segmentation
    Xiu, Chi
    Song, Rou
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 29 - 32
  • [43] Punctuation as Implicit Annotations for Chinese Word Segmentation
    Li, Zhongguo
    Sun, Maosong
    COMPUTATIONAL LINGUISTICS, 2009, 35 (04) : 505 - 512
  • [44] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    2000, Sci Press (37):
  • [45] Weighted self Distillation for Chinese word segmentation
    He, Rian
    Cai, Shubin
    Ming, Zhong
    Zhang, Jialei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1757 - 1770
  • [46] Is It Possible to Use Chatbot for the Chinese Word Segmentation?
    Kai-Cheng, Chang
    Hsien-Tsung, Chang
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 20 - 24
  • [47] A Local Generative Model for Chinese Word Segmentation
    Zhang, Kaixu
    Sun, Maosong
    Xue, Ping
    INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 420 - +
  • [48] Maximum likelihood algorithm on Chinese word segmentation
    Lo, WS
    Wong, PF
    Siu, MH
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 468 - 471
  • [49] Research on word segmentation for Chinese sign language
    Cheng, Yinchao
    Yin, Baocai
    Sun, Yanfeng
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 407 - 413
  • [50] RethinkCWS: Is Chinese Word Segmentation a Solved Task?
    Fu, Jinlan
    Liu, Pengfei
    Zhang, Qi
    Huang, Xuanjing
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5676 - 5686