Towards Accurate and Efficient Chinese Part-of-Speech Tagging

被引:9
|
作者
Sun, Weiwei [1 ]
Wan, Xiaojun [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1162/COLI_a_00253
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.
引用
收藏
页码:391 / 419
页数:29
相关论文
共 50 条
  • [21] Corpus based part-of-speech tagging
    Lv, Chengyao
    Liu, Huihua
    Dong, Yuanxing
    Chen, Yunliang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (03) : 647 - 654
  • [22] Part-of-speech tagging without training
    Bressan, S
    Indradjaja, LS
    INTELLIGENCE IN COMMUNICATION SYSTEMS, 2004, 3283 : 112 - 119
  • [23] Domain adaptation in part-of-speech tagging
    Institute of Exact and Natural Sciences, Federal University of Pará , Pará, Brazil
    不详
    Emerging Applic. of Nat. Lang. Proc.: Concepts and New Res., (52-72):
  • [24] Part-of-Speech Tagging for Azerbaijani Language
    Mammadov, Samir
    Rustamov, Samir
    Mustafali, Ali
    Sadigov, Ziyaddin
    Mollayev, Rasim
    Mammadov, Zamir
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 40 - 45
  • [25] Part-of-Speech Tagging by Latent Analogy
    Bellegarda, Jerome R.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 985 - 993
  • [26] Research on the System of Jointing Chinese Word Segmentation with Part-of-speech Tagging
    Li, Qin
    Wei, Wei
    2013 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2013, : 387 - 390
  • [27] On Certain Aspects of Kazakh Part-of-Speech Tagging
    Makazhanov, Aibek
    Yessenbayev, Zhandos
    Sabyrgaliyev, Islam
    Sharafudinov, Anuar
    Makhambetov, Olzhas
    2014 IEEE 8TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2014, : 240 - 243
  • [28] Part-of-Speech Tagging Using Evolutionary Computation
    Silva, Ana Paula
    Silva, Arlindo
    Rodrigues, Irene
    NATURE INSPIRED COOPERATIVE STRATEGIES FOR OPTIMIZATION (NICSO 2013), 2014, 512 : 167 - +
  • [29] Part-of-Speech (POS) Tagging for the Nyishi Language
    Siram, Joyir
    Sambyo, Koj
    Sarkar, Achyuth
    ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 191 - 199
  • [30] Impact of imperfect OCR on part-of-speech tagging
    Lin, XF
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 284 - 288