An integrated approach for Chinese word segmentation

被引:0
|
作者
Fu, GH [1 ]
Luke, KK [1 ]
机构
[1] Univ Hong Kong, Dept Linguist, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an integrated approach for Chinese word segmentation, which can perform disambiguation and unknown word identification simultaneously on the input. In this work, a hybrid model is used to score known word candidates and unknown word candidates equally by incorporating the modified word-formation models (viz. word-juncture models and word-formation patterns) into word bigram models, with which different types of features are statistically computed and combined for this integrated segmentation, including internal word-formation power of components in a word, affinity relations between these components and the external contextual information. To enhance the precision and avoid the problem of combination explosion in word candidate construction, a filter algorithm is also given to block ineligible unknown word candidates. In this way, ambiguity and unknown word can be resolved effectively. The results of our experiment on Peking University corpus show that the integrated approach outperforms the other two-stage methods under discussion.
引用
收藏
页码:80 / 87
页数:8
相关论文
共 50 条
  • [21] Chinese Word Segmentation with Character Abstraction
    Tian, Le
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 36 - 43
  • [22] Neural Word Segmentation Learning for Chinese
    Cai, Deng
    Zhao, Hai
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 409 - 420
  • [23] A corpus of Chinese word segmentation agreement
    Tsang, Yiu-Kei
    Yan, Ming
    Pan, Jinger
    Chan, Megan Yin Kan
    BEHAVIOR RESEARCH METHODS, 2024, 57 (01)
  • [24] A No-Word-Segmentation Hierarchical Clustering Approach to Chinese Web search results
    Zhang, Hui
    Zhao, Liping
    Liu, Rui
    Wang, Deqing
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 573 - 577
  • [25] Context-based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation
    Feng, Su-qin
    Hou, Su-qin
    ICIC 2009: SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTING SCIENCE, VOL 2, PROCEEDINGS: IMAGE ANALYSIS, INFORMATION AND SIGNAL PROCESSING, 2009, : 43 - +
  • [26] A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
    Han, Aaron Li-Feng
    Wong, Derek F.
    Chao, Lidia S.
    He, Liangye
    Zhu, Ling
    Li, Shuo
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 111 - 118
  • [27] A Hybrid Approach For Word Segmentation
    Mohammed, Ammar
    Karam, Mohamed
    Hefny, Hesham
    2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2015, : 232 - 238
  • [28] A Word Segmentation Method of Ancient Chinese Based on Word Alignment
    Che, Chao
    Zhao, Hanyu
    Wu, Xiaoting
    Zhou, Dongsheng
    Zhang, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 761 - 772
  • [29] New Cyber Word Discovery Using Chinese Word Segmentation
    Wang, Hao
    Wang, Bing
    Zou, MengYu
    Duan, JianYong
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 970 - 975
  • [30] Which is essential for Chinese word segmentation: Character versus word
    Huang, Chang-Ning
    Zhao, Hai
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 1 - 12