CellJoin: a parallel stream join operator for the cell processor

被引:31
作者
Gedik, Bugra [1 ]
Bordawekar, Rajesh R. [1 ]
Yu, Philip S. [2 ]
机构
[1] Thomas J Watson Ctr, IBM Res, Hawthorne, NY 10532 USA
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Cytology - Data handling - Computer architecture - Cells - Information management - Data transfer;
D O I
10.1007/s00778-008-0116-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rate-aware batching can be used to balance join throughput and tuple delay, and finally single-instruction multiple-data (SIMD) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of a parts per thousand 13 GB/s) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2,000 tuples/s using 15 min windows and without dropping any tuples, resulting in a parts per thousand 8.3 times higher output rate compared to an SSE implementation on dual 3.2 GHz Intel Xeon).
引用
收藏
页码:501 / 519
页数:19
相关论文
共 32 条
[21]  
*IBM, 2006, FULL SYST SIM CELL B
[22]  
*INT, 2003, IXP2400 NETW PROC HA
[23]  
JAIN N, 2006, ACM INT C MAN DAT SI
[24]  
KANG J, 2003, INT C DAT ENG ICDE
[25]   Cell multiprocessor communication network: Built for speed [J].
Kistler, Michael ;
Perrone, Michael ;
Petrini, Fabrizio .
IEEE MICRO, 2006, 26 (03) :10-23
[26]  
LAKSHMI MS, 1989, IEEE INT C DAT ENG I
[27]  
Petrini F., 2007, IEEE INT PAR DISTR P
[28]  
SRIVASTAVA U, 2004, INT C VER LARG DAT B
[29]  
Stonebraker M., 2005, C STORE COLUMN ORIEN
[30]  
Stonebraker Michael., 1986, IEEE Database Eng. Bull, V9