Active Learning With Sampling by Uncertainty and Density for Data Annotations

被引：91

作者：

Zhu, Jingbo ^{[1
,2
]}

Wang, Huizhen ^{[1
,2
]}

Tsou, Benjamin K. ^{[3
]}

Ma, Matthew ^{[4
]}

机构：

[1] Northeastern Univ, Minist Educ, Key Lab Med Image Comp, Shenyang 110004, Peoples R China

[2] Northeastern Univ, Nat Language Proc Lab, Shenyang 110004, Peoples R China

[3] City Univ Hong Kong, Language Informat Sci Res Ctr, Hong Kong, Hong Kong, Peoples R China

[4] Sci Works, Princeton Jct, NJ 08550 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Active learning; density-based re-ranking; sampling by uncertainty and density; text classification; uncertainty sampling; word sense disambiguation (WSD);

D O I：

10.1109/TASL.2009.2033421

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty sampling, which uses one classifier to identify unlabeled examples with the least confidence. Uncertainty sampling often presents problems when outliers are selected. To solve the outlier problem, this paper presents two techniques, sampling by uncertainty and density (SUD) and density-based re-ranking. Both techniques prefer not only the most informative example in terms of uncertainty criterion, but also the most representative example in terms of density criterion. Experimental results of active learning for word sense disambiguation and text classification tasks using six real-world evaluation data sets demonstrate the effectiveness of the proposed methods.

引用

页码：1323 / 1331

页数：9