Towards Accurate and Efficient Chinese Part-of-Speech Tagging

被引:9
|
作者
Sun, Weiwei [1 ]
Wan, Xiaojun [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1162/COLI_a_00253
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.
引用
收藏
页码:391 / 419
页数:29
相关论文
共 50 条
  • [1] Part-of-speech tagging
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [2] Chinese Part-of-speech Tagging Based on Fusion Model
    Sun, Guang-Lu
    Lang, Fei
    Qiao, Pei-Li
    Xu, Zhi-Ming
    PROCEEDINGS OF THE 11TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2008,
  • [3] Accurate Part-of-Speech Tagging via Conditional Random Field
    Zhang, Jinmei
    Zhang, Yucheng
    INTERNET OF VEHICLES - TECHNOLOGIES AND SERVICES, 2016, 10036 : 217 - 224
  • [4] Fast and accurate part-of-speech tagging:: The SVM approach revisited
    Giménez, J
    Márquez, L
    Recent Advances in Natural Language Processing III, 2004, 260 : 153 - 162
  • [5] Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging
    Kurniawan, Kemal
    Aji, Alham Fikri
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 303 - 307
  • [6] Part-of-speech tagging for Swedish
    Prütz, K
    PARALLEL CORPORA, PARALLEL WORLDS, 2002, (43): : 201 - 206
  • [7] An integrated approach to Chinese word segmentation and part-of-speech tagging
    Sun, Maosong
    Xu, Dongliang
    Tsou, Benjamin K.
    Lu, Huaming
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 299 - +
  • [9] Repairing errors for Chinese word segmentation and part-of-speech tagging
    Yao, TF
    Ding, W
    Erbach, G
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1881 - 1886
  • [10] A method integrating rule and HMM for Chinese part-of-speech tagging
    Hui Ning
    Hua Yang
    Zhihui Li
    ICIEA 2007: 2ND IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-4, PROCEEDINGS, 2007, : 723 - 725