Selecting Top-k Data Science Models by Example Dataset

被引：1

作者：

Wang, Mengying ^{[1
]}

Guan, Sheng ^{[1
]}

Ma, Hanchao ^{[1
]}

Bian, Yiyang ^{[1
]}

Che, Haolai ^{[1
]}

Daundkar, Abhishek ^{[1
]}

Sehirlioglu, Alp ^{[1
]}

Wu, Yinghui ^{[1
]}

机构：

[1] Case Western Reserve Univ, Cleveland, OH 44106 USA

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

Model Selection; GNN-Based Recommendation; Knowledge Graph;

D O I：

10.1145/3583780.3615051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data analytical pipelines routinely involve various domain-specific data science models. Such models require expensive manual or training effort and often incur expensive validation costs (e.g., via scientific simulation analysis). Meanwhile, high-value models remain to be ad-hocly created, isolated, and underutilized for a broad community. Searching and accessing proper models for data analysis pipelines is desirable yet challenging for users without domain knowledge. This paper introduces ModsNet, a novel MODel SelectioN framework that only requires an Example daTaset. (1) We investigate the following problem: Given a library of pre-trained models, a limited amount of historical observations of their performance, and an "example" dataset as a query, return k models that are expected to perform the best over the query dataset. (2) We formulate a regression problem and introduce a knowledge-enhanced framework using a model-data interaction graph. Unlike traditional methods, (1) ModsNet uses a dynamic, cost-bounded "probe-and-select" strategy to incrementally identify promising pre-trained models in a strict cold-start scenario (when a new dataset without any interaction with existing models is given). (2) To reduce the learning cost, we develop a clustering-based sparsification strategy to prune unpromising models and their interactions. (3) We showcase ModsNet built on top of a crowdsourced materials knowledge base platform. Our experiments verified its effectiveness, efficiency, and applications over real-world analytical pipelines.

引用

页码：2686 / 2695

页数：10

共 35 条

[1]

Ailon Nir, 2012, SIAM J COMPUT, V2012

[2]

[Anonymous], 2019, PR MACH LEARN RES

[3]

[Anonymous], About us

[4]

[Anonymous], TOLG LAB CHEST XRAY

[5]

[Anonymous], 2010, IEEE Transactions on Knowledge and Data Engineering, DOI DOI 10.1109/TKDE.2009.191

[6]

Elshawi R., 2019, Automated machine learning: State-of-the-art and open challenges

[7] Graph Neural Networks for Social Recommendation [J].

Fan, Wenqi ;

Ma, Yao ;

Li, Qing ;

He, Yuan ;

Zhao, Eric ;

Tang, Jiliang ;

Yin, Dawei .

WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :417-426

[8]

Gao Chen, 2022, TORS

[9]

Garey M. R., 1979, Computers and intractability. A guide to the theory of NP-completeness

[10]

Haoyong Chen, 2016, 2016 IEEE Power and Energy Society General Meeting (PESGM), DOI [10.1109/iWEM.2016.7504980, 10.1109/PESGM.2016.7741231]

← 1 2 3 4 →