A Framework for Low-Communication 1-D FFT

被引:0
作者
Tang, Ping Tak Peter [1 ]
park, Jongsoo [1 ]
Kim, Daehyun [1 ]
Petrov, Vladimir [1 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
来源
2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) | 2012年
关键词
FAST FOURIER-TRANSFORMS; IMPLEMENTATION; ALGORITHM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived. For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters. Moreover, our framework allows tradeoff between accuracy and performance, further boosting performance if reduced accuracy is acceptable.
引用
收藏
页数:12
相关论文
共 35 条
  • [1] Agullo E., 2010, P IEEE INT S PAR DIS
  • [2] Communication efficient adaptive matrix transpose algorithm for FFT on symmetric multiprocessors
    AL Na'mneh, R
    Pan, WD
    Adhami, R
    [J]. Proceedings of the Thirty-Seventh Southeastern Symposium on System Theory, 2005, : 312 - 315
  • [3] [Anonymous], 1992, COMPUTATIONAL FRAMEW
  • [4] [Anonymous], 1989, Algorithms for Discrete Fourier Transform and Convolution
  • [5] FFTS IN EXTERNAL OR HIERARCHICAL MEMORY
    BAILEY, DH
    [J]. JOURNAL OF SUPERCOMPUTING, 1990, 4 (01) : 23 - 35
  • [6] Ballard G., 2011, P 23 ACM S PAR ALG A
  • [7] MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA
    Ballard, Grey
    Demmel, James
    Holtz, Olga
    Schwartz, Oded
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2011, 32 (03) : 866 - 901
  • [8] Sampling multipliers and the Poisson summation formula
    Benedetto, JJ
    Zimmermann, G
    [J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 1997, 3 (05) : 505 - 523
  • [9] Brigham E. O., 1988, FAST FOURIER TRANSFO
  • [10] Prescribed error tolerances within fixed computational times for scattering problems of arbitrarily high frequency: the convex case
    Bruno, OP
    Geuzaine, CA
    Monro, JA
    Reitich, F
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2004, 362 (1816): : 629 - 645