Towards Accurate and Efficient Chinese Part-of-Speech Tagging

被引:9
作者
Sun, Weiwei [1 ]
Wan, Xiaojun [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1162/COLI_a_00253
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.
引用
收藏
页码:391 / 419
页数:29
相关论文
共 50 条
  • [31] High performance part-of-speech tagging of Bulgarian
    Doychinova, V
    Mihov, S
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2004, 3192 : 246 - 255
  • [32] Analyzing Tagging Accuracy of Part-of-Speech Taggers
    Khin, Nyein Pyae Pyae
    Aung, Than Nwe
    GENETIC AND EVOLUTIONARY COMPUTING, VOL II, 2016, 388 : 347 - 354
  • [33] Part-of-speech tagging with recurrent neural networks
    Pérez-Ortiz, JA
    Forcada, ML
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1588 - 1592
  • [34] Part-of-speech tagging using genetic algorithms
    Department of Computer Science and Engineering, Lovely Professional University, Jalandhar
    Punjab, India
    Int. J. Simul. Syst. Sci. Technol., 6 (11.1-11.7): : 11.1 - 11.7
  • [35] Dual Decomposition for Vietnamese Part-of-Speech Tagging
    Bach, Ngo Xuan
    Hiraishi, Kunihiko
    Le Minh, Nguyen
    Shimazu, Akira
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 123 - 131
  • [36] Part-of-speech tagging with two sequential transducers
    Kempe, A
    COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2000, 2001, (37): : 88 - 96
  • [37] Part-Of-Speech Tagging for Social Media Texts
    Neunerdt, Melanie
    Trevisan, Bianka
    Reyer, Michael
    Mathar, Rudolf
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 139 - 150
  • [38] Improved estimation for unsupervised part-of-speech tagging
    Wang, QI
    Schuurmans, D
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 219 - 224
  • [39] Automatic Word Segmentation and Part-Of-Speech Tagging for Classical Chinese Based on Radicals
    Chang, Bolin
    Y., Yuan
    B., Li
    Z., Xu
    M., Feng
    D., Wang
    Data Analysis and Knowledge Discovery, 2024, 8 (11) : 102 - 113
  • [40] FarsiTag: A part-of-speech tagging system for Persian
    Rezai, Mohammad Javad
    Miangah, Tayebeh Mosavi
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2017, 32 (03) : 632 - 642