NUMA-Aware Scalable and Efficient In-Memory Aggregation on Large Domains

被引:10
作者
Wang, Li [1 ]
Zhou, Minqi [1 ]
Zhang, Zhenjie [2 ]
Shan, Ming-Chien [3 ]
Zhou, Aoying [1 ]
机构
[1] E China Normal Univ, Software Engn Inst, Shanghai 200062, Peoples R China
[2] Illinois Singapore Pte Ltd, Adv Digital Sci Ctr, Singapore, Singapore
[3] SAP Res, Palo Alto, CA USA
基金
美国国家科学基金会;
关键词
Aggregation; radix-partitioning; in-memory databases; cache miss;
D O I
10.1109/TKDE.2014.2359675
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Business Intelligence (BI) is recognized as one of the most important IT applications in the coming big data era. In recent years, non-uniform memory access (NUMA) has become the de-facto architecture of multiprocessors on the new generation of enterprise servers. Such new architecture brings new challenges to optimization techniques on traditional operators in BI. Aggregation, for example, is one of the basic building blocks of BI, while its processing performance with existing hash-based algorithms scales poorly in terms of the number of cores under NUMA architecture. In this paper, we provide new solutions to tackle the problem of parallel hash-based aggregation, especially targeting at domains of extremely large cardinality. We propose a NUMA-aware radix partitioning (NaRP) method which divides the original huge relation table into subsets, without invoking expensive remote memory access between nodes of the cores. We also present a new efficient aggregation algorithm (EAA), to aggregate the partitioned data in parallel with low cache coherence miss and locking costs. Theoretical analysis as well as empirical study on an IBM X5 server prove that our proposals are at least two times faster than existing methods.
引用
收藏
页码:1071 / 1084
页数:14
相关论文
共 20 条
[11]  
Cieslewicz John., 2007, VLDB 07, P339
[12]  
DeWitt D. J., 1990, IEEE Transactions on Knowledge and Data Engineering, V2, P44, DOI 10.1109/69.50905
[13]   QUERY EVALUATION TECHNIQUES FOR LARGE DATABASES [J].
GRAEFE, G .
COMPUTING SURVEYS, 1993, 25 (02) :73-170
[14]  
Haas P. J., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P311
[15]  
Hennessy John L., 2011, Computer architecture: a quantitative approach
[16]  
Intel, 2009, 320412001US INT
[17]   Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs [J].
Kim, Changkyu ;
Sedlar, Eric ;
Chhugani, Jatin ;
Kaldewey, Tim ;
Nguyen, Anthony D. ;
Di Bias, Andrea ;
Lee, Victor W. ;
Satish, Nadathur ;
Dubey, Pradeep .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1378-1389
[18]   Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System [J].
Molka, Daniel ;
Hackenberg, Daniel ;
Schoene, Robert ;
Mueller, Matthias S. .
18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, :261-270
[19]  
Shatdal A., 1995, SIGMOD Record, V24, P104, DOI 10.1145/568271.223801
[20]   FALSE SHARING AND SPATIAL LOCALITY IN MULTIPROCESSOR CACHES [J].
TORRELLAS, J ;
LAM, MS ;
HENNESSY, JL .
IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (06) :651-663