A Local Generative Model for Chinese Word Segmentation

被引:0
|
作者
Zhang, Kaixu [1 ]
Sun, Maosong [1 ]
Xue, Ping [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] The Boeing Co, Chicago, IL USA
来源
关键词
probability model; natural language processing; Chinese word segmentation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a local generative model for Chinese word segmentation, which has faster learning process than discriminative models and can do unsupervised learning. It has the ability to make use of larger resources. In this model, four successive characters are used to determine whether a character interval should be a word boundary or riot. The Gibbs sampling algorithm, as well as three additional rules, is applied for the unsupervised learning. Besides words, the word candidates that are generated by our model can improve the performance of Chinese information retrieval. The experiments show that in supervised learning our method outperforms a language model based method. And the performance on one corpus is better than the best one reported in SIGH:AN bakeoff 05. In unsupervised learning, our method achieves the comparable performance compared to the state-of-the-art method.
引用
收藏
页码:420 / +
页数:2
相关论文
共 50 条
  • [31] A Word Segmentation Method of Ancient Chinese Based on Word Alignment
    Che, Chao
    Zhao, Hanyu
    Wu, Xiaoting
    Zhou, Dongsheng
    Zhang, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 761 - 772
  • [32] New Cyber Word Discovery Using Chinese Word Segmentation
    Wang, Hao
    Wang, Bing
    Zou, MengYu
    Duan, JianYong
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 970 - 975
  • [33] Which is essential for Chinese word segmentation: Character versus word
    Huang, Chang-Ning
    Zhao, Hai
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 1 - 12
  • [34] Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model
    王向东
    杨阳
    张金超
    姜文斌
    刘宏
    钱跃良
    JournalofShanghaiJiaotongUniversity(Science), 2017, 22 (01) : 82 - 86
  • [35] A Deep Convolutional Neural Model for Character-Based Chinese Word Segmentation
    Xie, Zhipeng
    Hu, Junfeng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 380 - 392
  • [36] The Research of Chinese Automatic Word Segmentation In Hierarchical Model Dictionary Binary Tree
    Luo XianGang
    Luo Jin
    Xie Zhong
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 321 - 324
  • [37] Application of MPSO-based Neural Network model in Chinese word segmentation
    Cheng, Xiaorong
    Wang, Dong
    Xie, Kun
    ICICTA: 2009 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL I, PROCEEDINGS, 2009, : 295 - 298
  • [38] CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement
    Guo, Shiqian
    Huang, Yansun
    Huang, Baohua
    Yang, Linda
    Zhou, Cong
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [39] A Data-Driven Model for Automated Chinese Word Segmentation and POS Tagging
    Xu, Qing
    Wang, Zhiyou
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [40] A Chinese word segmentation model for energy literature based on Conditional Random Fields
    Zhao, Liujun
    Kong, Weizheng
    Chai, Bo
    2018 2ND IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2018, : 785 - 788