Recent advances in scaling-down sampling methods in machine learning

被引:23
作者
ElRafey A. [1 ]
Wojtusiak J. [1 ]
机构
[1] Health Administration and Policy, George Mason University, Fairfax, VA
关键词
data mining; evolutionary computation; health infomratics; machine learning;
D O I
10.1002/wics.1414
中图分类号
学科分类号
摘要
Data sampling methods have been investigated for decades in the context of machine learning and statistical algorithms, with significant progress made in the past few years driven by strong interest in big data and distributed computing. Most recently, progress has been made in methods that can be broadly categorized into random sampling including density-biased and nonuniform sampling methods; active learning methods, which are a type of semi-supervised learning and an area of intense research; and progressive sampling methods which can be viewed as a combination of the above two approaches. A unified view of scaling-down sampling methods is presented in this article and complemented with descriptions of relevant published literature. WIREs Comput Stat 2017, 9:e1414. doi: 10.1002/wics.1414. For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.
引用
收藏
相关论文
共 142 条
[1]  
(2012)
[2]  
Hilbert M., Lopez P., The world's technological capacity to store, communicate, and compute information, Science, 332, pp. 60-65, (2011)
[3]  
(2014)
[4]  
Settles B., Active learning, Synth Lect Artif Intell Mach Learn, 6, pp. 1-114, (2012)
[5]  
Tomanek K., Olsson F., A web survey on the use of active learning to support annotation of text data, In, Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 45-48, (2009)
[6]  
Hesabi Z.R., Tari Z., Goscinski A., Fahad A., Khalil I., Queiroz C., Data summarization techniques for big data—a survey, Handbook on Data Centers, pp. 1109-1152, (2015)
[7]  
Vitter J.S., Random sampling with a reservoir, ACM Trans Math Softw, 11, pp. 37-57, (1985)
[8]  
Michalski R.S., On the selection of representative samples from large relational tables for inductive inference, (1975)
[9]  
Wald A., On the efficient design of statistical investigations, Ann Math Stat, 14, pp. 134-140, (1943)
[10]  
Liu H., Motoda H., Instance Selection and Construction for Data Mining, 608, (2013)