Sample, Estimate, Tune: Scaling Bayesian Auto-tuning of Data Science Pipelines

被引:4
作者
Anderson, Alec [1 ]
Dubois, Sebastien [2 ]
Cuesta-Infante, Alfredo [3 ]
Veeramachaneni, Kalyan [1 ]
机构
[1] MIT, LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Stanford Univ, Palo Alto, CA 94304 USA
[3] Univ Ray Juan Carlos, Madrid, Spain
来源
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2017年
关键词
OPTIMIZATION;
D O I
10.1109/DSAA.2017.82
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe a system for sequential hyperparameter optimization that scales to work with complex pipelines and large datasets. Currently, the state-of-the-art in hyperparameter optimization improves on randomized and grid search by using sequential Bayesian optimization to explore the space of hyperparameters in a more informed way. These methods, however, are not scalable, as the entire data science pipeline still must be evaluated on all the data. By designing a sub sampling based approach to estimate pipeline performance, along with a distributed evaluation system, we provide a scalable solution, which we illustrate using complex image and text data pipelines. For three pipelines, we show that we are able to gain similar performance improvements, but by computing on substantially less data.
引用
收藏
页码:361 / 372
页数:12
相关论文
共 53 条
[1]  
Anderson A., 2017, DEPOSITIONAL CONTROL, P1
[2]  
[Anonymous], 1908, BIOMETRIKA, V6, P1
[3]  
[Anonymous], 2010, Advances in Neural Information Processing Systems (NeurIPS)
[4]  
[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
[5]  
[Anonymous], 2017, SCIKIT OPTIMIZE
[6]  
Baker B., 2016, ARXIV E PRINTS
[7]   Consciousness is not a property of states: A reply to Wilberg [J].
Berger, Jacob .
PHILOSOPHICAL PSYCHOLOGY, 2014, 27 (06) :829-842
[8]  
Bergstra J., 2011, Adv. Neural Inf. Process. Syst, V24
[9]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[10]  
Bischl B, 2016, J MACH LEARN RES, V17