ModsNet: Performance-aware Top-k Model Search using Exemplar Datasets

被引：0

作者：

Wang, Mengying ^{[1
]}

Ma, Hanchao ^{[1
]}

Guan, Sheng ^{[1
]}

Bian, Yiyang ^{[1
]}

Che, Haolai ^{[1
]}

Daundkar, Abhishek ^{[1
]}

Sehirlioglu, Alp ^{[1
]}

Wu, Yinghui ^{[1
]}

机构：

[1] Case Western Reserve Univ, Cleveland, OH 44106 USA

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期

关键词：

D O I：

10.14778/3685800.3685899

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We demonstrate ModsNet, a search tool for pre-trained data science MOD els recommendatioN using Examplar daTaset. Given a set of pre-trained data science models, an "example" input dataset, and a user-specified performance metric, ModsNet answers the following query: " what are top-k models that have the best expected performance for the input data?" The need for searching high-quality pre-trained models is evident in data-driven analysis. Inspired by "query by example" paradigm, ModsNet does not require users to write complex queries, but only provide an "examplar" dataset, a task description, and a performance measure as input, and can automatically suggest top-k matching models that are expected to have desirable performance to perform the task over the provided sample dataset. ModsNet utilizes a knowledge graph to integrate model performances over datasets and synchronizes it with a bipartite graph neural network to estimate model performance, reduce inference cost, and promptly respond to top-k model search queries. To cope with strict cold-start (upon receiving a new dataset when no historical performance of registered models are observed), it performs a dynamic, cost-bounded "probe-and-select" strategy to incrementally identify promising models. We demonstrate the application of ModsNet in enabling efficient scientific data analysis.

引用

页数：4

共 12 条

[1]

[Anonymous], [37] Gor. https://github.com/adjust/gor.2024.

[2]

Chen M, 2020, ADV NEUR IN, V33

[3] Pre-trained models: Past, present and future [J].

Han, Xu ;

Zhang, Zhengyan ;

Ding, Ning ;

Gu, Yuxian ;

Liu, Xiao ;

Huo, Yuqi ;

Qiu, Jiezhong ;

Yao, Yuan ;

Zhang, Ao ;

Zhang, Liang ;

Han, Wentao ;

Huang, Minlie ;

Jin, Qin ;

Lan, Yanyan ;

Liu, Yang ;

Liu, Zhiyuan ;

Lu, Zhiwu ;

Qiu, Xipeng ;

Song, Ruihua ;

Tang, Jie ;

Wen, Ji-Rong ;

Yuan, Jinhui ;

Zhao, Wayne Xin ;

Zhu, Jun .

AI OPEN, 2021, 2 :225-250

[4] LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation [J].

He, Xiangnan ;

Deng, Kuan ;

Wang, Xiang ;

Li, Yan ;

Zhang, Yongdong ;

Wang, Meng .

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :639-648

[5]

huggingface, 2024, Hugging Face-the AI community building the future

[6]

Kaggle, 2024, Kaggle: your home for data science

[7]

Kornblith S, 2019, PR MACH LEARN RES, V97

[8] Meta-features for meta-learning [J].

Rivolli, Adriano ;

Garcia, Luis P. F. ;

Soares, Carlos ;

Vanschoren, Joaquin ;

de Carvalho, Andre C. P. L. F. .

KNOWLEDGE-BASED SYSTEMS, 2022, 240

[9] Scientific machine learning benchmarks [J].

Thiyagalingam, Jeyan ;

Shankar, Mallikarjun ;

Fox, Geoffrey ;

Hey, Tony .

NATURE REVIEWS PHYSICS, 2022, 4 (06) :413-420

[10] Selecting Top-k Data Science Models by Example Dataset [J].

Wang, Mengying ;

Guan, Sheng ;

Ma, Hanchao ;

Bian, Yiyang ;

Che, Haolai ;

Daundkar, Abhishek ;

Sehirlioglu, Alp ;

Wu, Yinghui .

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, :2686-2695

← 1 2 →