Strategies to Select Examples for Active Learning with Conditional Random Fields

被引:5
作者
Claveau, Vincent [1 ]
Kijak, Ewa [1 ]
机构
[1] Univ Rennes 1, CNRS, IRISA, Campus Beaulieu, Rennes, France
来源
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I | 2018年 / 10761卷
关键词
CRF; Conditional random fields; Active learning; Semi-supervised learning; Statistical test of proportion;
D O I
10.1007/978-3-319-77113-7_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, many NLP problems are tackled as supervised machine learning tasks. Consequently, the cost of the expertise needed to annotate the examples is a widespread issue. Active learning offers a framework to that issue, allowing to control the annotation cost while maximizing the classifier performance, but it relies on the key step of choosing which example will be proposed to the expert. In this paper, we examine and propose such selection strategies in the specific case of Conditional Random Fields (CRF) which are largely used in NLP. On the one hand, we propose a simple method to correct a bias of some state-of-the-art selection techniques. On the other hand, we detail an original approach to select the examples, based on the respect of proportions in the datasets. These contributions are validated over a large range of experiments implying several datasets and tasks, including named entity recognition, chunking, phonetization, word sense disambiguation.
引用
收藏
页码:30 / 43
页数:14
相关论文
共 28 条
[1]  
[Anonymous], P C EMNLP
[2]  
[Anonymous], ACT C TRAIT AUT LANG
[3]  
[Anonymous], 1998, P 15 INT C MACH LEAR
[4]  
[Anonymous], LEARNING PART OF SPE
[5]  
[Anonymous], REGULARIZATION VARIA
[6]  
[Anonymous], ACT JOURN ET PAR
[7]  
[Anonymous], 1648 U WISC MAD COMP
[8]  
[Anonymous], TRAITEMENT AUTOMATIQ
[9]  
[Anonymous], P C ACL
[10]  
[Anonymous], IEEE C COMP VIS PATT