Using DCT-based Approximate Communication to Improve MPI Performance in Parallel Clusters

被引:3
作者
Fan, Qianqian [1 ]
Lilja, David J. [1 ]
Sapatnekar, Sachin S. [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
来源
2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC) | 2019年
基金
美国国家科学基金会;
关键词
COMPRESSION; ALGORITHM;
D O I
10.1109/ipccc47392.2019.8958720
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Communication overheads in distributed systems constitute a large fraction of the total execution time, and limit the scalability of applications running on these systems. We propose a DCT-based approximate communication scheme that takes advantage of the error resiliency of several widely-used applications, and improves communication efficiency by substantially reducing message lengths. Our scheme is implemented into the Message Passing Interface (MPI) library. When evaluated on several representative MPI applications on a real cluster system, it is seen that the fraction of total execution time devoted to communication reduces from 59% to 23%, even accounting for the computational overhead required for DCT encoding. For many communication-intensive applications, it is shown that our approximate communication scheme effectively speeds up the total execution time without much loss in quality of the result.
引用
收藏
页数:10
相关论文
共 29 条
  • [1] [Anonymous], 2012, TECH REP
  • [2] [Anonymous], 2018, MPI: A message passing interface standard
  • [3] Bergman K, 2008, Tech. Rep
  • [4] Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems
    Betzel, Filipe
    Khatamifard, Karen
    Suresh, Harini
    Lilja, David J.
    Sartori, John
    Karpuzcu, Ulya
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (01)
  • [5] Bhatele A., 2018, TECH REP
  • [6] Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
    Bicer, Tekin
    Yin, Jian
    Chiu, David
    Agrawal, Gagan
    Schuchardt, Karen
    [J]. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1205 - 1216
  • [7] Low-Complexity Order-64 Integer Cosine Transform Design and Its Application in HEVC
    Chen, Zhe
    Han, Qinglong
    Cham, Wai-Kuen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (09) : 2407 - 2412
  • [8] NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
    Chen, Zhengzhang
    Son, Seung Woo
    Hendrix, William
    Agrawal, Ankit
    Liao, Wei-keng
    Choudhary, Alok
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 733 - 744
  • [9] Low-complexity 8-point DCT approximations based on integer functions
    Cintra, R. J.
    Bayer, F. M.
    Tablada, C. J.
    [J]. SIGNAL PROCESSING, 2014, 99 : 201 - 214
  • [10] A DCT Approximation for Image Compression
    Cintra, Renato J.
    Bayer, Fabio M.
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (10) : 579 - 582