AutoML to Date and Beyond: Challenges and Opportunities

被引:158
作者
Karmaker , Shubhra Kanti [1 ]
Hassan, Md Mahadi [1 ]
Smith, Micah J. [2 ]
Xu, Lei [2 ]
Zhai, Chengxiang [3 ]
Veeramachaneni, Kalyan [2 ]
机构
[1] Auburn Univ, Samuel Ginn Coll Engn, 3106 Shelby Ctr,345 W Magnolia Ave, Auburn, AL 36849 USA
[2] MIT, LIDS, MIT Stata Ctr, 32 Vassar St,Room 32-D712, Cambridge, MA 02139 USA
[3] Univ Illinois, Thomas M Siebel Ctr Comp Sci, 201 North Goodwin Ave MC 258, Urbana, IL 61801 USA
关键词
Automated machine learning; interactive data science; democratization of artificial intelligence; predictive analytics; FEATURE-SELECTION;
D O I
10.1145/3470918
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML's main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually-generally by a data scientist-and explain how this limits domain experts' access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.
引用
收藏
页数:36
相关论文
共 72 条
[1]  
[Anonymous], 2017, Machine teaching: A new paradigm for building machine learning systems
[2]  
[Anonymous], 2017, ARXIV170907150
[3]  
[Anonymous], 2019, The Journal of Machine Learning Research
[4]  
Baker Bowen, 2017, P 6 INT C LEARNING R
[5]  
Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26
[6]  
Berger G., 2018, LINKEDIN 2018 EMERGI
[7]  
Bergstra J., 2013, P 30 INT C INT C CMA, V28, P115, DOI DOI 10.5555/3042817.3042832
[8]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[9]  
Cashman Dylan, 2018, ARXIV180910782
[10]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28