Active Learning for Chinese Word Segmentation on Judgements

被引:1
作者
Yan, Qian [1 ]
Wang, Limin [1 ]
Li, Shoushan [1 ]
Liu, Huan [1 ]
Zhou, Guodong [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Nat Language Proc Lab, Suzhou, Peoples R China
来源
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017 | 2018年 / 10619卷
关键词
Chinese word segmentation; Active learning; Judgements;
D O I
10.1007/978-3-319-73618-1_73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to perform the task of Chinese Word Segmentation on judgements. For this task, the main challenge is the lack of the annotated corpus. To alleviate this challenge, this paper proposes an active learning approach. Specifically, on the basis of a few initial annotated samples, a new active learning approach is proposed to annotate some informative characters, and then select the context around these characters for annotation. In the active learning approach, it not only considers the uncertainty of the sample, but also leverages the redundancy of the sample for the selection of informative characters. Furthermore, this paper adopts the local annotation strategy, which select a substrings around the informative characters rather than the whole sentences and thus could also reduce the annotation. The empirical study demonstrates that the proposed approach effectively reduces the annotation cost and performances better than other baseline sample selection strategies under the same scale of annotation.
引用
收藏
页码:839 / 848
页数:10
相关论文
共 21 条
[1]  
[Anonymous], 2001, PROC 18 INT C MACH L
[2]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[3]  
Cai D, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P409
[4]  
Chen C., 2012, Proceedings of the 24th International Conference on Computational Linguistics, P529
[5]   Unusual entanglement transformation properties of the quantum radiation through one-dimensional random system containing left-handed-materials [J].
Dong, Yunxia ;
Zhang, Xiangdong .
PROCEEDINGS OF THE 2008 INTERNATIONAL WORKSHOP ON METAMATERIALS, 2008, :216-218
[6]   Chinese word segmentation and named entity recognition: A pragmatic approach [J].
Gao, JF ;
Li, M ;
Wu, A ;
Huang, CN .
COMPUTATIONAL LINGUISTICS, 2005, 31 (04) :531-574
[7]  
Lewis D. D., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P3
[8]  
Li S, 2012, P COLING, P683
[9]  
Maosong S, 2002, P ACL, P1265
[10]  
Sassano M, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P505