Sample, Estimate, Tune: Scaling Bayesian Auto-tuning of Data Science Pipelines

被引:4
作者
Anderson, Alec [1 ]
Dubois, Sebastien [2 ]
Cuesta-Infante, Alfredo [3 ]
Veeramachaneni, Kalyan [1 ]
机构
[1] MIT, LIDS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Stanford Univ, Palo Alto, CA 94304 USA
[3] Univ Ray Juan Carlos, Madrid, Spain
来源
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2017年
关键词
OPTIMIZATION;
D O I
10.1109/DSAA.2017.82
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe a system for sequential hyperparameter optimization that scales to work with complex pipelines and large datasets. Currently, the state-of-the-art in hyperparameter optimization improves on randomized and grid search by using sequential Bayesian optimization to explore the space of hyperparameters in a more informed way. These methods, however, are not scalable, as the entire data science pipeline still must be evaluated on all the data. By designing a sub sampling based approach to estimate pipeline performance, along with a distributed evaluation system, we provide a scalable solution, which we illustrate using complex image and text data pipelines. For three pipelines, we show that we are able to gain similar performance improvements, but by computing on substantially less data.
引用
收藏
页码:361 / 372
页数:12
相关论文
共 53 条
[31]  
Kleiner A., 2011, ARXIV E PRINTS
[32]  
Li L., 2017, ARXIV160306560, V18, P6765, DOI 10.48550/arxiv.1603.06560
[33]  
Lophaven S.N., 2002, IMMTR200212 TECHN U
[34]  
Lowe D., 1999, P 7 IEEE INT C COMP, V2, P1150, DOI [10.1109/ICCV.1999.790410, DOI 10.1109/ICCV.1999.790410]
[35]  
Maclaurin D., 2015, ARXIV E PRINTS
[36]  
Martinez-Cantin R, 2014, J MACH LEARN RES, V15, P3735
[37]  
McGibbon R. T., 2016, J. Open Source Softw, V1, P34, DOI [DOI 10.21105/JOSS.00034, 10.21105/joss.00034]
[38]  
McKerns M.M., 2012, ARXIV E PRINTS
[39]   Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science [J].
Olson, Randal S. ;
Bartley, Nathan ;
Urbanowicz, Ryan J. ;
Moore, Jason H. .
GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, :485-492
[40]  
Oneto L., 2015, PROCEEDINGS, P261