Multi-stage Method for Online Vertical Data Partitioning Based on Spectral Clustering

被引:0
作者
Liu P.-J. [1 ,2 ]
Li H.-Y. [1 ,2 ]
Wang T.-Y. [1 ,2 ]
Liu H. [1 ,2 ]
Sun L.-M. [1 ,2 ]
Ren Y.-F. [3 ]
Li C.-P. [1 ,2 ]
Chen H. [1 ,2 ]
机构
[1] Key Laboratory of Data Engineering and Knowledge Engineering, the Ministry of Education, Renmin University of China, Beijing
[2] School of Information, Renmin University of China, Beijing
[3] Huawei Cloud Database Innovation Lab, Shenzhen
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 06期
关键词
frequent itemsets; greedy search; multistage decision; spectral clustering; vertical partitioning;
D O I
10.13328/j.cnki.jos.006496
中图分类号
学科分类号
摘要
Vertical data partitioning technology logically stores database table attributes satisfying certain semantic conditions in the same physical block, so as to reduce the cost of data accessing and improve the efficiency of query processing. Every query is usually related only to the table’s some attributes in the database, so a subset of the table’s attributes can be used to get the accurate query results. Reasonable vertical data partitioning can make most queries answered without scanning the whole table, so as to reduce the amount of data accessing and improve the efficiency of query processing. Traditional database vertical partitioning methods are mainly based on heuristic rules set by experts. The granularity of partitioning is coarse, and it can not provide different partition optimizations according to the characteristics of workload. Besides, when the scale of workload or the number of attributes becomes large, the execution time of the existing methods are too long to meet the performance requirements of online real-time tuning of database. Therefore, a method called spectral clustering based vertical partitioning (SCVP) is proposed for the online environment. The idea of phased solution is adapted to reduce the time complexity of the algorithm and speed up partitioning. Firstly, SCVP reduces the solution space by increasing the constraint conditions, that is, generating initial partitions by spectral clustering. Secondly, SCVP designs the algorithm to search solution space, that is, the initial partitions are optimized by combining frequent itemset mining and greedy search. In order to further improve the performance of SCVP under high-dimensional attributes, a new method called special clustering based vertical partitioning redesign (SCVP-R) is proposed which is an improved version of SCVP. SCVP-R optimizes the partitions combiner component of SCVP by introducing sympatric-competition mechanism, double-elimination mechanism, and loop mechanism. The experimental results on different datasets show that SCVP and SCVP-R have faster execution time and better performance than the current state-of-the-art vertical partitioning method. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2804 / 2832
页数:28
相关论文
共 33 条
[21]  
Rodriguez L, Li XO., A vertical partitioning algorithm for distributed multimedia databases, Proc. of the 22nd Int’l Conf. on Database and Expert Systems Applications, pp. 544-558, (2011)
[22]  
Huang YF, Lai CJ., Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning, Information Sciences, 328, pp. 288-301, (2016)
[23]  
Arulraj J, Pavlo A, Menon P., Bridging the archipelago between row-stores and column-stores for hybrid workloads, Proc. of the 2016 Int’l Conf. on Management of Data, pp. 583-598, (2016)
[24]  
Shukla S, Naganna S., A review on K-means data clustering approach, Int’l Journal of Information & Computation Technology, 4, 17, pp. 1847-1860, (2014)
[25]  
Von Luxburg U., A tutorial on spectral clustering, Statistics and Computing, 17, 4, pp. 395-416, (2007)
[26]  
Calinski T, Harabasz J., A dendrite method for cluster analysis, Communications in Statistics, 3, 1, pp. 1-27, (1974)
[27]  
Han JW, Pei J, Yin YW, Mao RY., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery, 8, 1, pp. 53-87, (2004)
[28]  
Son JH, Kim MH., An adaptable vertical partitioning method in distributed systems, Journal of Systems and Software, 73, 3, pp. 551-561, (2004)
[29]  
Agrawal S, Narasayya V, Yang B., Integrating vertical and horizontal partitioning into automated physical database design, Proc. of the 2004 ACM SIGMOD Int’l Conf. on Management of Data, pp. 359-370, (2004)
[30]  
Durand GC, Pinnecke M, Piriyev R, Mohsen M, Broneske D, Saake G, Sekeran MS, Rodriguez F, Balami L., Gridformation: Towards self-driven online data partitioning using reinforcement learning, Proc. of the 1st Int’l Workshop on Exploiting Artificial Intelligence Techniques for Data Management, (2018)