A Data-centric Profiler for Parallel Programs

被引:16
作者
Liu, Xu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77005 USA
来源
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC) | 2013年
关键词
Data-centric profiling; scalable profiler; data locality;
D O I
10.1145/2503210.2503297
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
It is difficult to manually identify opportunities for enhancing data locality. To address this problem, we extended the HPCToolkit performance tools to support data-centric profiling of scalable parallel programs. Our tool uses hardware counters to directly measure memory access latency and attributes latency metrics to both variables and instructions. Different hardware counters provide insight into different aspects of data locality (or lack thereof). Unlike prior tools for data-centric analysis, our tool employs scalable measurement, analysis, and presentation methods that enable it to analyze the memory access behavior of scalable parallel programs with low runtime and space overhead. We demonstrate the utility of HPCToolkit's new data-centric analysis capabilities with case studies of five well-known benchmarks. In each benchmark, we identify performance bottlenecks caused by poor data locality and demonstrate non-trivial performance optimizations enabled by this guidance.
引用
收藏
页数:12
相关论文
共 26 条
[1]   HPCTOOLKIT: tools for performance analysis of optimized parallel programs [J].
Adhianto, L. ;
Banerjee, S. ;
Fagan, M. ;
Krentel, M. ;
Marin, G. ;
Mellor-Crummey, J. ;
Tallent, N. R. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (06) :685-701
[2]   Continuous profiling: Where have all the cycles gone? [J].
Anderson, JM ;
Berc, LM ;
Dean, J ;
Ghemawat, S ;
Henzinger, MR ;
Leung, STA ;
Sites, RL ;
Vandevoorde, MT ;
Waldspurger, CA ;
Weihl, WE .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1997, 15 (04) :357-390
[3]  
Beyls K, 2006, LECT NOTES COMPUT SC, V4208, P220
[4]   Refactoring for Data Locality [J].
Beyls, Kristof ;
D'Hollander, Erik H. .
COMPUTER, 2009, 42 (02) :62-71
[5]  
Buck B. R., 2004, SC 04, P58
[6]  
Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
[7]   ProfileMe: Hardware support for instruction-level profiling on out-of-order processors [J].
Dean, J ;
Hicks, JE ;
Waldspurger, CA ;
Weihl, WE ;
Chrysos, G .
THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, :292-302
[8]  
Drongowski P. J., 2007, INSTRUCTIONBASED SAM
[9]  
Froyd N., 2005, P 19 ANN INT C SUPER, P81
[10]  
Graham S. L., 1982, SIGPLAN Notices, V17, P120, DOI 10.1145/872726.806987