A Survey of Machine Learning Based Database Techniques

被引:0
作者
Li G.-L. [1 ]
Zhou X.-H. [1 ]
Sun J. [1 ]
Yu X. [1 ]
Yuan H.-T. [1 ]
Liu J.-B. [1 ]
Han Y. [1 ]
机构
[1] Department of Computer Science, Tsinghua University, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2020年 / 43卷 / 11期
基金
中国国家自然科学基金;
关键词
Database; Deep learning; Machine learning; Query optimization; Reinforcement learning;
D O I
10.11897/SP.J.1016.2020.02019
中图分类号
学科分类号
摘要
In the era of big data, for the ever-expanding data volume, complex and diverse application scenarios, heterogeneous hardware architecture and different types of users, traditional database techniques cannot adapt to these new scenarios and changes. So machine learning, known for its learning ability, gradually shows potential and application prospects in database. Based on full investigation and analysis, we first summarize the requirements of machine learning for building an efficient, reliable, highly available and adaptive database system, including database operation and maintenance, data storage, optimizer and executor, query optimization, database workload management, database security and privacy, database self-management, database for machine learning. Then, we discuss the potential challenges in the process of combining machine learning algorithms with database techniques from four aspects, including lack of training data, long training time, limited generalization ability, and challenges in applying machine learning models with specific database problems. Next, we survey the researches of machine-learning-based techniques, including automatic parameter tuning, automatic cardinality estimation, automatic query plan selection, automatic index and view selection. Automatic tuning technology includes heuristic algorithm, traditional machine learning and deep reinforcement learning. Heuristic algorithms explore the optimal subspace through sampling from the discrete parameter space, which can effectively improve the efficiency of parameter tuning, but they are difficult to find the appropriate configuration within the resource limit; traditional machine learning algorithm learns the mapping relationship between the system state and the specified workload template in the reduced dimension parameter space, which improves the adaptability of the model; deep reinforcement learning iteratively learns the optimization strategy in the high-dimensional parameter space, and uses neural network to improve the processing ability of high-dimensional data. It can effectively reduce the demand of training data; automatic cardinality estimation includes query-oriented method and query-plan-oriented method. The former uses convolutional neural network (CNN) to learn the relationship among data, filter conditions and join conditions. However, it is poor in generalization for different datasets. The latter estimates cardinality of physical operators in cascades, which improves the adaptability to different queries. Query plan selection includes deep learning and reinforcement learning. The deep learning method integrates the estimated cost values and data characteristics, which improve the accuracy of each plan cost estimation, but the results depend heavily on the accuracy of the estimator; deep reinforcement learning method iteratively generates the query plan based on the final goal, and it reduces the dependence on query cost. Automatic index selection includes classifier, reinforcement learning and genetic algorithm: the classification algorithm analyzes the cost of building indexes and the efficiency of different indexes based on the table characteristics. By combining the genetic algorithm, it improves the recommendation efficiency of composite index; reinforcement learning realizes online index selection by incrementally recommending indexes. Automatic view selection includes heuristic algorithm, probability statistics and reinforcement learning. Heuristic algorithms improve selection efficiency by greedily exploring directed acyclic graph of candidate views, but its adaptability is poor. Statistics-based methods formalize view selection into a 0-1 selection problem, effectively reducing the exploration cost of graph. Reinforcement learning methods model the creation and deletion of view into a dynamic selection process, and further improve selection efficiency with a try-and-error training pattern. Finally, we provide the revolutionary breakthroughs that machine learning technologies will bring to databases from eight perspectives. © 2020, Science Press. All right reserved.
引用
收藏
页码:2019 / 2049
页数:30
相关论文
共 66 条
  • [1] Ali R, Shahin N, Zikria Y B, Et al., Deep reinforcement learning paradigm for performance optimization of channel observation-based MAC protocols in dense WLANs, IEEE Access, 7, pp. 3500-3511, (2019)
  • [2] Lin R, Stanley M D, Ghassemi M M, Nemati S., A deep deterministic policy gradient approach to medication dosing and surveillance in the ICU, Proceedings of the Engineering in Medicine and Biology Conference, pp. 4927-4931, (2018)
  • [3] Wu E., Crazy idea! Databases<sup>▷◁</sup> reinforcement-learning research, Proceedings of the Biennial Conference on Innovative Data Systems Research, (2019)
  • [4] Zhu Y, Liu J, Guo M, Et al., BestConfig: Tapping the performance potential of systems via automatic configuration tuning, Proceedings of the ACM Symposium on Cloud Computing, pp. 338-350, (2017)
  • [5] Dikaleh S G, Xiao D, Felix C, Et al., Introduction to neural networks, Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research, (2017)
  • [6] Gallinucci E, Golfarelli M., SparkTune: Tuning spark SQL through query cost modeling, Proceedings of the Conference of the International Conference on Extending Database Technology, pp. 546-549, (2019)
  • [7] Trummer I, Moseley S, Maram D, Et al., SkinnerDB: Regret-bounded query evaluation, Proceedings of the International Conference on Very Large Data Bases, 11, 7, pp. 800-812, (2018)
  • [8] Li G, Zhou X, Gao B, Li S., QTune: A query-aware database tuning system with deep reinforcement learning, Proceedings of the International Conference on Very Large Data Bases, pp. 2118-2130, (2019)
  • [9] Van Aken D, Pavlo A, Gordon G J, Zhang B., Automatic database management system tuning through large-scale machine learning, Proceedings of the ACM Special Interest Group on Management of Data, pp. 1009-1024, (2017)
  • [10] Zhang J, Liu Y, Zhou K, Et al., An end-to-end automatic cloud database tuning system using deep reinforcement learning, Proceedings of the ACM Special Interest Group on Management of Data, pp. 415-432, (2019)