Tera-Scale 1D FFT with Low-Communication Algorithm and Intel® Xeon Phi™ Coprocessors

被引:3
|
作者
Park, Jongsoo [1 ]
Bikshandi, Ganesh [1 ]
Vaidyanathan, Karthikeyan [1 ]
Tang, Ping Tak Peter [2 ]
Dubey, Pradeep [1 ]
Kim, Daehyun [1 ]
机构
[1] Intel Corp, Parallel Comp Lab, Santa Clara, CA 95051 USA
[2] Intel Corp, Software & Serv Grp, Santa Clara, CA 95051 USA
来源
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) | 2013年
关键词
Bandwidth Optimizations; Communication-Avoiding Algorithms; FFT; Wide-Vector Many-Core Processors; Xeon Phi;
D O I
10.1145/2503210.2503242
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper demonstrates the first tera-scale performance of Intel (R) Xeon Phi (TM) coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, we break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach 6.7 TFLOPS with 512 nodes, which is 1.5x than achievable on a same number of Intel (R) Xeon (R) nodes. It is a challenge to fully utilize the compute capability presented by many-core wide-vector processors for bandwidth-bound FFT computation. We leverage a new algorithm, Segment-of-Interest FFT, with low inter-node communication cost, and aggressively optimize data movements in node-local computations, exploiting caches. Our coordination of low communication algorithm and massively parallel architecture for scalable performance is not limited to running FFT on Xeon Phi; it can serve as a reference for other bandwidth-bound computations and for emerging HPC systems that are increasingly communication limited.
引用
收藏
页数:12
相关论文
共 4 条
  • [1] A Framework for Low-Communication 1-D FFT
    Tang, Ping Tak Peter
    park, Jongsoo
    Kim, Daehyun
    Petrov, Vladimir
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [2] A framework for low-communication 1-D FFT
    Tang, Ping Tak Peter
    Park, Jongsoo
    Kim, Daehyun
    Petrov, Vladimir
    SCIENTIFIC PROGRAMMING, 2013, 21 (3-4) : 181 - 195
  • [3] An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT I, 2017, 10404 : 401 - 410
  • [4] Implementation of low communication frequency 3D FFT algorithm for ultra-large-scale micromagnetics simulation
    Tsukahara, Hiroshi
    Iwano, Kaoru
    Mitsumata, Chiharu
    Ishikawa, Tadashi
    Ono, Kanta
    COMPUTER PHYSICS COMMUNICATIONS, 2016, 207 : 217 - 220