A Local Generative Model for Chinese Word Segmentation

被引:0
|
作者
Zhang, Kaixu [1 ]
Sun, Maosong [1 ]
Xue, Ping [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] The Boeing Co, Chicago, IL USA
来源
关键词
probability model; natural language processing; Chinese word segmentation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a local generative model for Chinese word segmentation, which has faster learning process than discriminative models and can do unsupervised learning. It has the ability to make use of larger resources. In this model, four successive characters are used to determine whether a character interval should be a word boundary or riot. The Gibbs sampling algorithm, as well as three additional rules, is applied for the unsupervised learning. Besides words, the word candidates that are generated by our model can improve the performance of Chinese information retrieval. The experiments show that in supervised learning our method outperforms a language model based method. And the performance on one corpus is better than the best one reported in SIGH:AN bakeoff 05. In unsupervised learning, our method achieves the comparable performance compared to the state-of-the-art method.
引用
收藏
页码:420 / +
页数:2
相关论文
共 50 条
  • [1] Chinese word segmentation with local and global context representation learning
    李岩
    Zhang Yinghua
    Huang Xiaoping
    Yin Xucheng
    Hao Hongwei
    High Technology Letters, 2015, 21 (01) : 71 - 77
  • [2] Chinese word segmentation with local and global context representation learning
    School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing
    100083, China
    不详
    100190, China
    High Technol Letters, 1 (71-77):
  • [3] An Improved Embedding Matching Model for Chinese Word Segmentation
    Deng, Xiaolong
    Sun, Yingfei
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 195 - 200
  • [4] Machine Reading Comprehension Model for Chinese Word Segmentation
    zhou Y.
    Chen Y.
    Huang R.
    Qin Y.
    Lin C.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2022, 56 (08): : 95 - 103
  • [5] A Cognitive Model of Chinese Word Segmentation for Machine Translation
    Wu, Zhijie
    META, 2011, 56 (03) : 631 - 644
  • [6] An Effective Joint Model for Chinese Word Segmentation and POS Tagging
    Wang, Heng-Jun
    Si, Nian-Wen
    Chen, Cheng
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [7] A trigram statistical language model algorithm for chinese word segmentation
    Mao, Jun
    Cheng, Gang
    He, Yanxiang
    Xing, Zehuan
    FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2007, 4613 : 271 - +
  • [8] A Unified Model for Solving the OOV Problem of Chinese Word Segmentation
    Li, Xiaoqing
    Zong, Chengqing
    Su, Keh-Yih
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (03)
  • [9] Word Segmentation for Chinese Novels
    Qiu, Likun
    Zhang, Yue
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2440 - 2446
  • [10] Probabilistic Chinese word segmentation with non-local information and stochastic training
    Sun, Xu
    Zhang, Yaozhong
    Matsuzaki, Takuya
    Tsuruoka, Yoshimasa
    Tsujii, Jun'ichi
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (03) : 626 - 636