Hybrid active learning for reducing the annotation effort of operators in classification systems

被引:76
作者
Lughofer, Edwin [1 ]
机构
[1] Johannes Kepler Univ Linz, Dept Knowledge Based Math Syst, Fuzzy Log Lab Linz Hagenberg, Linz, Austria
关键词
Active learning; Reduction of annotation effort; Unsupervised selection criteria; Certainty-based selection; On-line update of classifiers; FUZZY; FEATURES; CLASSIFIERS; FLEXFIS;
D O I
10.1016/j.patcog.2011.08.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Active learning is understood as any form of learning in which the learning algorithm has some control over the input samples due to a specific sample selection process based on which it builds up the model. In this paper, we propose a novel active learning strategy for data-driven classifiers, which is based on unsupervised criterion during off-line training phase, followed by a supervised certainty-based criterion during incremental on-line training. In this sense, we call the new strategy hybrid active learning. Sample selection in the first phase is conducted from scratch (i.e. no initial labels/learners are needed) based on purely unsupervised criteria obtained from clusters: samples lying near cluster centers and near the borders of clusters are expected to represent the most informative ones regarding the distribution characteristics of the classes. In the second phase, the task is to update already trained classifiers during on-line mode with the most important samples in order to dynamically guide the classifier to more predictive power. Both strategies are essential for reducing the annotation and supervision effort of operators in off-line and on-line classification systems, as operators only have to label an exquisite subset of the off-line training data resp. give feedback only on specific occasions during on-line phase. The new active learning strategy is evaluated based on real-world data sets from UCI repository and collected at on-line quality control systems. The results show that an active learning based selection of training samples (1) does not weaken the classification accuracies compared to when using all samples in the training process and (2) can out-perform classifiers which are built on randomly selected data samples. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:884 / 896
页数:13
相关论文
共 56 条
[1]  
[Anonymous], 1966, Applied regression analysis
[2]  
[Anonymous], METRIKA
[3]  
[Anonymous], THESIS U SO CALIFORN
[4]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[5]  
[Anonymous], Pattern Recognition with Fuzzy Objective Function Algorithms
[6]  
[Anonymous], 2006, PRACTICAL GUIDE SUPP
[7]   Aggregate features and ADABOOST for music classification [J].
Bergstra, James ;
Casagrande, Norman ;
Erhan, Dumitru ;
Eck, Douglas ;
Kegl, Balazs .
MACHINE LEARNING, 2006, 65 (2-3) :473-484
[8]   Incremental Induction of Fuzzy Classification Rules [J].
Bouchachia, Abdelhamid .
2009 IEEE WORKSHOP ON EVOLVING AND SELF-DEVELOPING INTELLIGENT SYSTEMS, 2009, :32-39
[9]   A comparison of features for speech, music discrimination. [J].
Carey, MJ ;
Parris, ES ;
Lloyd-Thomas, H .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :149-152
[10]   Active learning for image retrieval with Co-SVM [J].
Cheng, Jian ;
Wang, Kongqiao .
PATTERN RECOGNITION, 2007, 40 (01) :330-334