ModsNet: Performance-aware Top-k Model Search using Exemplar Datasets

被引:0
作者
Wang, Mengying [1 ]
Ma, Hanchao [1 ]
Guan, Sheng [1 ]
Bian, Yiyang [1 ]
Che, Haolai [1 ]
Daundkar, Abhishek [1 ]
Sehirlioglu, Alp [1 ]
Wu, Yinghui [1 ]
机构
[1] Case Western Reserve Univ, Cleveland, OH 44106 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
关键词
D O I
10.14778/3685800.3685899
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We demonstrate ModsNet, a search tool for pre-trained data science MOD els recommendatioN using Examplar daTaset. Given a set of pre-trained data science models, an "example" input dataset, and a user-specified performance metric, ModsNet answers the following query: " what are top-k models that have the best expected performance for the input data?" The need for searching high-quality pre-trained models is evident in data-driven analysis. Inspired by "query by example" paradigm, ModsNet does not require users to write complex queries, but only provide an "examplar" dataset, a task description, and a performance measure as input, and can automatically suggest top-k matching models that are expected to have desirable performance to perform the task over the provided sample dataset. ModsNet utilizes a knowledge graph to integrate model performances over datasets and synchronizes it with a bipartite graph neural network to estimate model performance, reduce inference cost, and promptly respond to top-k model search queries. To cope with strict cold-start (upon receiving a new dataset when no historical performance of registered models are observed), it performs a dynamic, cost-bounded "probe-and-select" strategy to incrementally identify promising models. We demonstrate the application of ModsNet in enabling efficient scientific data analysis.
引用
收藏
页数:4
相关论文
共 12 条
[1]  
[Anonymous], [37] Gor. https://github.com/adjust/gor.2024.
[2]  
Chen M, 2020, ADV NEUR IN, V33
[3]   Pre-trained models: Past, present and future [J].
Han, Xu ;
Zhang, Zhengyan ;
Ding, Ning ;
Gu, Yuxian ;
Liu, Xiao ;
Huo, Yuqi ;
Qiu, Jiezhong ;
Yao, Yuan ;
Zhang, Ao ;
Zhang, Liang ;
Han, Wentao ;
Huang, Minlie ;
Jin, Qin ;
Lan, Yanyan ;
Liu, Yang ;
Liu, Zhiyuan ;
Lu, Zhiwu ;
Qiu, Xipeng ;
Song, Ruihua ;
Tang, Jie ;
Wen, Ji-Rong ;
Yuan, Jinhui ;
Zhao, Wayne Xin ;
Zhu, Jun .
AI OPEN, 2021, 2 :225-250
[4]   LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation [J].
He, Xiangnan ;
Deng, Kuan ;
Wang, Xiang ;
Li, Yan ;
Zhang, Yongdong ;
Wang, Meng .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :639-648
[5]  
huggingface, 2024, Hugging Face-the AI community building the future
[6]  
Kaggle, 2024, Kaggle: your home for data science
[7]  
Kornblith S, 2019, PR MACH LEARN RES, V97
[8]   Meta-features for meta-learning [J].
Rivolli, Adriano ;
Garcia, Luis P. F. ;
Soares, Carlos ;
Vanschoren, Joaquin ;
de Carvalho, Andre C. P. L. F. .
KNOWLEDGE-BASED SYSTEMS, 2022, 240
[9]   Scientific machine learning benchmarks [J].
Thiyagalingam, Jeyan ;
Shankar, Mallikarjun ;
Fox, Geoffrey ;
Hey, Tony .
NATURE REVIEWS PHYSICS, 2022, 4 (06) :413-420
[10]   Selecting Top-k Data Science Models by Example Dataset [J].
Wang, Mengying ;
Guan, Sheng ;
Ma, Hanchao ;
Bian, Yiyang ;
Che, Haolai ;
Daundkar, Abhishek ;
Sehirlioglu, Alp ;
Wu, Yinghui .
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, :2686-2695