Machine Learning for Data Management: A System View

被引:4
作者
Li, Guoliang [1 ]
Zhou, Xuanhe [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Beijing, Peoples R China
来源
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022) | 2022年
关键词
data management; machine learning; TUNING SYSTEM; INDEX;
D O I
10.1109/ICDE53745.2022.00297
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning techniques have been proposed to optimize data management in recent years. Compared with traditional empirical data management, learning-based methods extract knowledge from historical tasks, generalize the extracted knowledge to similar new tasks, and can achieve better performance in many scenarios (e.g., knob tuning, cardinality estimation). However, data management systems require to handle various and dynamic workloads in different scenarios, and there are some challenges in applying machine learning techniques for data management systems. First, with various workloads and hundreds of system metrics, how to select and characterize effective features for data management problems? Second, with diversified machine learning models, how to design the proper models? Third, with various data management requirements, how to validate whether the machine learning models can meet the requirements? In this tutorial, we discuss existing learning-based data management studies and how they solve the above challenges, and provide some future research directions.
引用
收藏
页码:3198 / 3201
页数:4
相关论文
共 57 条
[1]  
AKEN D. V., 2017, SIGMOD
[2]  
[Anonymous], 2015, SIGMOD
[3]   Editorial for SI: VLDB 2020 [J].
Balazinska, Magdalena ;
Zhou, Xiaofang .
VLDB JOURNAL, 2022, 31 (06) :1237-1238
[4]   Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J].
Bubeck, Sebastien ;
Cesa-Bianchi, Nicolo .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01) :1-122
[5]  
Das S., 2016, SIGMOD
[6]   AI Meets AI: Leveraging Query Executions to Improve Index Recommendations [J].
Ding, Bailu ;
Das, Sudipto ;
Marcus, Ryan ;
Wu, Wentao ;
Chaudhuri, Surajit ;
Narasayya, Vivek R. .
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, :1241-1258
[7]   Instance-Optimized Data Layouts for Cloud Analytics Workloads [J].
Ding, Jialin ;
Minhas, Umar Farooq ;
Chandramouli, Badrish ;
Wang, Chi ;
Li, Yinan ;
Li, Ying ;
Kossmann, Donald ;
Gehrke, Johannes ;
Kraska, Tim .
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, :418-431
[8]   ALEX: An Updatable Adaptive Learned Index [J].
Ding, Jialin ;
Minhas, Umar Farooq ;
Yu, Jia ;
Wang, Chi ;
Do, Jaeyoung ;
Li, Yinan ;
Zhang, Hantian ;
Chandramouli, Badrish ;
Gehrke, Johannes ;
Kossmann, Donald ;
Lomet, David ;
Kraska, Tim .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :969-984
[9]   Selectivity Estimation for Range Predicates using Lightweight Models [J].
Dutt, Anshuman ;
Wang, Chi ;
Nazi, Azade ;
Kandula, Srikanth ;
Narasayya, Vivek ;
Chaudhuri, Surajit .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (09) :1044-1057
[10]  
Fan J., 2020, VLDB