Scalable Active Learning by Approximated Error Reduction

被引:28
作者
Fu, Weijie [1 ]
Wang, Meng [1 ]
Hao, Shijie [1 ]
Wu, Xindong [1 ]
机构
[1] Hefei Univ Technol, Hefei, Anhui, Peoples R China
来源
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2018年
关键词
active learning; query selection; efficient algorithms;
D O I
10.1145/3219819.3219954
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of active learning for multi-class classification on large-scale datasets. In this setting, the existing active learning approaches built upon uncertainty measures are ineffective for discovering unknown regions, and those based on expected error reduction are inefficient owing to their huge time costs. To overcome the above issues, this paper proposes a novel query selection criterion called approximated error reduction (AER). In AER, the error reduction of each candidate is estimated based on an expected impact over all datapoints and an approximated ratio between the error reduction and the impact over its nearby datapoints. In particular, we utilize hierarchical anchor graphs to construct the candidate set as well as the nearby datapoint sets of these candidates. The benefit of this strategy is that it enables a hierarchical expansion of candidates with the increase of labels, and allows us to further accelerate the AER estimation. We finally introduce AER into an efficient semi-supervised classifier for scalable active learning. Experiments on publicly available datasets with the sizes varying from thousands to millions demonstrate the effectiveness of our approach.
引用
收藏
页码:1396 / 1405
页数:10
相关论文
共 29 条
[1]  
[Anonymous], P INT C MACH LEARN
[2]  
[Anonymous], 2010, P 27 INT C MACH LEAR
[3]  
[Anonymous], 2017, IEEE Transactions on Big Data
[4]  
[Anonymous], 2015, ACM T KNOWL DISCOV D, DOI DOI 10.1145/2700408
[5]  
[Anonymous], 2006, P 23 INT C MACH LEAR
[6]  
[Anonymous], P INT C MACH LEARN
[7]  
[Anonymous], P INT C MACH LEARN
[8]  
[Anonymous], 2008, P C EMP METH NAT LAN, DOI DOI 10.3115/1613715.1613855
[9]   Manifold Adaptive Experimental Design for Text Categorization [J].
Cai, Deng ;
He, Xiaofei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) :707-719
[10]   Maximizing Expected Model Change for Active Learning in Regression [J].
Cai, Wenbin ;
Zhang, Ya ;
Zhou, Jun .
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, :51-60