An integrated approach for Chinese word segmentation

被引:0
|
作者
Fu, GH [1 ]
Luke, KK [1 ]
机构
[1] Univ Hong Kong, Dept Linguist, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an integrated approach for Chinese word segmentation, which can perform disambiguation and unknown word identification simultaneously on the input. In this work, a hybrid model is used to score known word candidates and unknown word candidates equally by incorporating the modified word-formation models (viz. word-juncture models and word-formation patterns) into word bigram models, with which different types of features are statistically computed and combined for this integrated segmentation, including internal word-formation power of components in a word, affinity relations between these components and the external contextual information. To enhance the precision and avoid the problem of combination explosion in word candidate construction, a filter algorithm is also given to block ineligible unknown word candidates. In this way, ambiguity and unknown word can be resolved effectively. The results of our experiment on Peking University corpus show that the integrated approach outperforms the other two-stage methods under discussion.
引用
收藏
页码:80 / 87
页数:8
相关论文
共 50 条
  • [1] An integrated approach to Chinese word segmentation and part-of-speech tagging
    Sun, Maosong
    Xu, Dongliang
    Tsou, Benjamin K.
    Lu, Huaming
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 299 - +
  • [2] A combining approach for Chinese word segmentation
    Aiqing, Wang
    Sen, Zhang
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 738 - +
  • [3] A Hybrid Approach to Chinese Word Segmentation
    Chen, Bing
    Tai, Xiaoying
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 154 - 158
  • [4] A Pragmatic Approach for Classical Chinese Word Segmentation
    Huang, Shilei
    Wu, Jiangqin
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1161 - 1168
  • [5] AntSeg: An ant approach to disambiguation of Chinese word segmentation
    Lv, Qiang
    Wang, Hongling
    Qian, Peide
    Luo, Xiaohu
    IRI 2006: PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2006, : 420 - +
  • [6] Integrated Geologic Terms and Dual Model for Chinese Geological Word Segmentation
    Cheng, Shupeng
    Wu, Kunkun
    Liu, Xiao
    Tang, Xianxing
    Hu, Maosheng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 3 - 15
  • [7] A BiLSTM-CRF Based Approach to Word Segmentation in Chinese
    Jin, Yuanyuan
    Tao, Shiyu
    Liu, Qi
    Liu, Xiaodong
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 568 - 571
  • [8] A practical approach to resolving combination ambiguity in Chinese word segmentation
    Qin, Ying
    Zhang, Suxiang
    Wang, Xiaojie
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1859 - +
  • [9] A post-processing feedback approach for Chinese word segmentation
    Gao, Song
    Zhou, Qiang
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 46 - 51
  • [10] Chinese word segmentation and named entity recognition: A pragmatic approach
    Gao, JF
    Li, M
    Wu, A
    Huang, CN
    COMPUTATIONAL LINGUISTICS, 2005, 31 (04) : 531 - 574