Towards Accurate and Efficient Chinese Part-of-Speech Tagging

被引：9

作者：

Sun, Weiwei ^{[1
]}

Wan, Xiaojun ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China

来源：

COMPUTATIONAL LINGUISTICS | 2016年 / 42卷 / 03期

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1162/COLI_a_00253

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

引用

页码：391 / 419

页数：29

共 50 条

[41] Part-Of-Speech Tagging for Social Media Texts
Neunerdt, Melanie
Trevisan, Bianka
Reyer, Michael
Mathar, Rudolf
LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 139 - 150
[42] Part-of-speech tagging with two sequential transducers
Kempe, A
COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2000, 2001, (37): : 88 - 96
[43] Improved estimation for unsupervised part-of-speech tagging
Wang, QI
Schuurmans, D
Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 219 - 224
[44] Ripple Down Rules for Part-of-Speech Tagging
Dat Quoc Nguyen
Dai Quoc Nguyen
Son Bao Pham
Dang Duc Pham
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 190 - 201
[45] Phrase-based part-of-speech tagging
Finch, Andrew
Sumita, Eiichiro
PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 215 - +
[46] A part-of-speech tagging method for English essay
1600, Beijing University of Posts and Telecommunications (37): : 120 - 124
[47] Part-of-speech studies in Chinese
Wang, Lu
JOURNAL OF QUANTITATIVE LINGUISTICS, 2016, 23 (03) : 235 - 255
[48] Semi-supervised Part-of-speech Tagging in Speech Applications
Dufour, Richard
Favre, Benoit
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1373 - 1376
[49] An Accurate Persian Part-of-Speech Tagger
Okhovvat, Morteza
Sharifi, Mohsen
Bidgoli, Behrouz Minaei
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 423 - 430
[50] Experimental study of hidden Markov model based part-of-speech tagging for Chinese texts
Sun, M.S., 1600, Press of Tsinghua University (40):

← 1 2 3 4 5 →