A Local Generative Model for Chinese Word Segmentation

被引:0
|
作者
Zhang, Kaixu [1 ]
Sun, Maosong [1 ]
Xue, Ping [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] The Boeing Co, Chicago, IL USA
来源
关键词
probability model; natural language processing; Chinese word segmentation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a local generative model for Chinese word segmentation, which has faster learning process than discriminative models and can do unsupervised learning. It has the ability to make use of larger resources. In this model, four successive characters are used to determine whether a character interval should be a word boundary or riot. The Gibbs sampling algorithm, as well as three additional rules, is applied for the unsupervised learning. Besides words, the word candidates that are generated by our model can improve the performance of Chinese information retrieval. The experiments show that in supervised learning our method outperforms a language model based method. And the performance on one corpus is better than the best one reported in SIGH:AN bakeoff 05. In unsupervised learning, our method achieves the comparable performance compared to the state-of-the-art method.
引用
收藏
页码:420 / +
页数:2
相关论文
共 50 条
  • [41] A Conditional Random Fields Model for Overlapping Ambiguity Resolution in Chinese Word Segmentation
    Liang, Yan
    Zhu, Yaoting
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 384 - +
  • [42] A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder
    Qiu, Xipeng
    Pei, Hengzhi
    Yan, Hang
    Huang, Xuanjing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [43] An Improved Method of Applying a Machine Translation Model to a Chinese Word Segmentation Task
    Wei, Yuekun
    Qu, Binbin
    Hu, Nan
    Han, Liu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 44 - 54
  • [44] Chinese to Braille translation based on Braille word segmentation using statistical model
    Wang X.
    Yang Y.
    Zhang J.
    Jiang W.
    Liu H.
    Qian Y.
    Wang, Xiangdong (xdwang@ict.ac.cn), 1600, Shanghai Jiaotong University (22): : 82 - 86
  • [45] A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing
    Yan, Hang
    Qiu, Xipeng
    Huang, Xuanjing
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 78 - 92
  • [46] Research on the model of integrating Chinese word segmentation with part-of-speech tagging
    Tong, Xiaojun
    Cui, Minggen
    Song, Guolong
    DCABES 2007 Proceedings, Vols I and II, 2007, : 1062 - 1065
  • [47] PARSER: A model for word segmentation
    Perruchet, P
    Vinter, A
    JOURNAL OF MEMORY AND LANGUAGE, 1998, 39 (02) : 246 - 263
  • [48] A Chinese Word Segmentation Based on Machine Learning
    Wang Hongsheng
    Cui Mingming
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 610 - 613
  • [49] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (04): : 418 - 424
  • [50] Enhancing Chinese Word Segmentation with Character Clustering
    Liu, Yijia
    Che, Wanxiang
    Liu, Ting
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 52 - 60