Discrete Fourier Transform on Multicore A review of optimizations necessary for good multicore performance

被引:46
作者
Franchetti, Franz [1 ,2 ]
Pueschel, Markus [1 ]
Voronenko, Yevgen [1 ]
Chellappa, Srinivas [1 ]
Moura, Jose M. F. [3 ]
机构
[1] Carnegie Mellon Univ, ECE Dept, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Inst Anal & Sci Comp, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Ctr Sensed Crit Infrastruct Res, Pittsburgh, PA 15213 USA
基金
奥地利科学基金会; 美国安德鲁·梅隆基金会; 美国国家科学基金会;
关键词
Discrete Fourier transforms; Multicore processing; Optimization; Signal processing algorithms;
D O I
10.1109/MSP.2009.934155
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Particle gives an overview on the techniques needed to implement the discrete Fourier transform (DFT) efficiently on current multicore systems, The focus is on Intel-compatible multicores, but we also discuss the IBM Cell and, briefly, graphics, processing units (GPUs). The performance optimization is broken down into three key challenges: parallelization, vectorization, and memory hierarchy optimization. In each case, we use the Kronecker product formalism to formally derive the necessary algorithmic transformations based on a few hardware parameters. Further code-level optimizations are discussed. The rigorous nature of this framework enables the complete automation of the implementation task as shown by the program generator Spiral. Finally, we show and analyze DFT benchmarks of the fastest libraries available for the considered platforms.
引用
收藏
页码:90 / 102
页数:13
相关论文
共 47 条
[1]  
Ali A., 2007, ICS '07: Proceedings of the 21st anjiual international conference on Supercomputing, P293
[2]  
ANIMPLEMENTATIO. T, 1911, P INT WORKSHOP, V1, P1
[3]  
BADERANDV DA, 2007, P INT C ILIQH PERIBR, V1, P172
[4]  
BAILEY DH, 1990, P BIT C ACOUSTICS, P548
[5]  
BLAKE G, 2009, LLFFS QNALLLROCESSIT, V6, P26
[6]  
CHELLAPPA S, 2009, P INT C SUPERCON, V26, P35
[7]  
CHELLAPPA S, 1935, FORINATIONAL TECHNIQ, V111, P111
[8]  
CHOW AC, 2005, REP MAY 2005 ONLINEL, V1, P113
[9]   AN ALGORITHM FOR MACHINE CALCULATION OF COMPLEX FOURIER SERIES [J].
COOLEY, JW ;
TUKEY, JW .
MATHEMATICS OF COMPUTATION, 1965, 19 (90) :297-&
[10]  
CORP I, 2008, INTEL INTEGRATED PER