Properties of a new adaptive sampling method with applications to scalable learning

被引:0
作者
Chen, Jianhua [1 ]
机构
[1] Louisiana State Univ, Sch EECS, Div Comp Sci & Engn, Baton Rouge, LA 70803 USA
关键词
Adaptive sampling; sample size; web mining; scalable learning;
D O I
10.3233/WEB-150322
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scalability has become an increasingly important issue for data mining and knowledge discovery in this "big data" era. Random sampling can be used to tackle the problem of scalable learning. Adaptive sampling is superior to traditional batch sampling methods as adaptive sampling typically requires a much lower sample size and thus better efficiency while assuring guaranteed level of estimation accuracy and confidence. In recent works, a new adaptive sampling method was developed by the author and colleagues, which has then been applied to build an efficient, scalable boosting learning algorithm. New results are presented in this paper on theoretical analysis of the proposed sampling method. Specifically, for controlling absolute error, a bound is established on the probability that the new sampling method would stop too early. It is also shown that the criterion function for controlling relative error is in fact a convex function. A new variant of the sampling method is also presented. Empirical simulation results indicate that our methods, both the new variant and the original algorithm, often use significantly lower sample size (i.e., the number of sampled instances) while maintaining competitive accuracy and confidence when compared with batch sampling method. An empirical study is also reported here showing efficient classification of Web advertisements using a sampling-based ensemble learner.
引用
收藏
页码:215 / 227
页数:13
相关论文
共 15 条
[1]   Properties of A New Adaptive Sampling Method with Applications to Scalable Learning [J].
Chen, Jianhua .
2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2013, :9-15
[2]   Scalable Ensemble Learning by Adaptive Sampling [J].
Chen, Jianhua .
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, :622-625
[3]  
Chen JH, 2011, LECT NOTES COMPUT SC, V6804, P220, DOI 10.1007/978-3-642-21916-0_25
[4]  
Chen X., ARXIV08091241
[5]   Exact computation of minimum sample size for estimation of binomial parameters [J].
Chen, Xinjia .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (08) :2622-2632
[6]   A MEASURE OF ASYMPTOTIC EFFICIENCY FOR TESTS OF A HYPOTHESIS BASED ON THE SUM OF OBSERVATIONS [J].
CHERNOFF, H .
ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (04) :493-507
[7]  
Domingo C, 2000, LECT NOTES ARTIF INT, V1805, P317
[8]  
Domingo C., 1999, P 2 INT C DISC SCI J
[9]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139