Selecting Top-k Data Science Models by Example Dataset

被引:1
作者
Wang, Mengying [1 ]
Guan, Sheng [1 ]
Ma, Hanchao [1 ]
Bian, Yiyang [1 ]
Che, Haolai [1 ]
Daundkar, Abhishek [1 ]
Sehirlioglu, Alp [1 ]
Wu, Yinghui [1 ]
机构
[1] Case Western Reserve Univ, Cleveland, OH 44106 USA
来源
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年
关键词
Model Selection; GNN-Based Recommendation; Knowledge Graph;
D O I
10.1145/3583780.3615051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data analytical pipelines routinely involve various domain-specific data science models. Such models require expensive manual or training effort and often incur expensive validation costs (e.g., via scientific simulation analysis). Meanwhile, high-value models remain to be ad-hocly created, isolated, and underutilized for a broad community. Searching and accessing proper models for data analysis pipelines is desirable yet challenging for users without domain knowledge. This paper introduces ModsNet, a novel MODel SelectioN framework that only requires an Example daTaset. (1) We investigate the following problem: Given a library of pre-trained models, a limited amount of historical observations of their performance, and an "example" dataset as a query, return k models that are expected to perform the best over the query dataset. (2) We formulate a regression problem and introduce a knowledge-enhanced framework using a model-data interaction graph. Unlike traditional methods, (1) ModsNet uses a dynamic, cost-bounded "probe-and-select" strategy to incrementally identify promising pre-trained models in a strict cold-start scenario (when a new dataset without any interaction with existing models is given). (2) To reduce the learning cost, we develop a clustering-based sparsification strategy to prune unpromising models and their interactions. (3) We showcase ModsNet built on top of a crowdsourced materials knowledge base platform. Our experiments verified its effectiveness, efficiency, and applications over real-world analytical pipelines.
引用
收藏
页码:2686 / 2695
页数:10
相关论文
共 35 条
[1]  
Ailon Nir, 2012, SIAM J COMPUT, V2012
[2]  
[Anonymous], 2019, PR MACH LEARN RES
[3]  
[Anonymous], About us
[4]  
[Anonymous], TOLG LAB CHEST XRAY
[5]  
[Anonymous], 2010, IEEE Transactions on Knowledge and Data Engineering, DOI DOI 10.1109/TKDE.2009.191
[6]  
Elshawi R., 2019, Automated machine learning: State-of-the-art and open challenges
[7]   Graph Neural Networks for Social Recommendation [J].
Fan, Wenqi ;
Ma, Yao ;
Li, Qing ;
He, Yuan ;
Zhao, Eric ;
Tang, Jiliang ;
Yin, Dawei .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :417-426
[8]  
Gao Chen, 2022, TORS
[9]  
Garey M. R., 1979, Computers and intractability. A guide to the theory of NP-completeness
[10]  
Haoyong Chen, 2016, 2016 IEEE Power and Energy Society General Meeting (PESGM), DOI [10.1109/iWEM.2016.7504980, 10.1109/PESGM.2016.7741231]