Active learning for logistic regression: an evaluation

被引:0
作者
Andrew I. Schein
Lyle H. Ungar
机构
[1] The University of Pennsylvania,Department of Computer and Information Science
来源
Machine Learning | 2007年 / 68卷
关键词
Active learning; Logistic regression; Experimental design; Generalized linear models;
D O I
暂无
中图分类号
学科分类号
摘要
Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random field models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as ‘A-optimality.’ We then run comparisons against different variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for different classes of problems and why. We find that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivative method called QBB-MM providing the most promising performance at very low computational cost. Computational running times of the experimental design methods were a bottleneck to the evaluations. Meanwhile, evaluation of the heuristic methods lead to an accumulation of negative results. We explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied. The results demonstrate a need for improved active learning methods that will provide reliable performance at a reasonable computational cost.
引用
收藏
页码:235 / 265
页数:30
相关论文
共 26 条
[1]  
Angluin D.(1987)Learning regular sets from queries and counterexamples Information and Computation 75 87-106
[2]  
Berger A. L.(1996)A maximum entropy approach to natural language processing Computational Linguistics 22 39-71
[3]  
Della Pietra S. A.(1996)Bagging predictors Machine Learning 24 123-140
[4]  
Della Pietra V. J.(1989)Optimal Bayesian design applied to logistic regression experiments Journal of Statistical Planning and Inference 21 191-208
[5]  
Breiman L.(1996)Neural network exploration using optimal experimental design Neural Networks 9 1071-1083
[6]  
Chaloner K.(2000)Learning to construct knowledge bases from the World Wide Web Artificial Intelligence 118 69-113
[7]  
Larntz K.(1972)Generalized iterative scaling for log-linear models Annals of Mathematical Statistics 43 1470-1480
[8]  
Cohn D. A.(1997)Selective sampling using the query by committee algorithm Machine Learning 28 133-168
[9]  
Craven M.(1992)Neural networks and the bias/variance dilemma Neural Computation 4 1-58
[10]  
DiPasquo D.(1992)The evidence framework applied to classification networks Neural Computation 4 698-714