PATHA: Performance Analysis Tool for HPC Applications

被引：0

作者：

Yoo, Wucherl ^{[1
]}

Koo, Michelle ^{[2
]}

Cao, Yi ^{[3
]}

Sim, Alex ^{[1
]}

Nugent, Peter ^{[1
,2
]}

Wu, Kesheng ^{[1
]}

机构：

[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

[3] CALTECH, Pasadena, CA 91125 USA

来源：

2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC) | 2015年

关键词：

Performance analysis; Performance evaluation; High performance computing;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. To address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). Our study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.

引用

页数：8

共 26 条

[1] HPCTOOLKIT: tools for performance analysis of optimized parallel programs
Adhianto, L.
Banerjee, S.
Fagan, M.
Krentel, M.
Marin, G.
Mellor-Crummey, J.
Tallent, N. R.
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (06) : 685 - 701
[2] [Anonymous], SC 12
[3] Barham P, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P259
[4] Bod P., 2010, EUROSYS 10 P 5 EUR C, P111
[5] Bohme David, 2010, Proceedings 39th International Conference on Parallel Processing (ICPP 2010), P90, DOI 10.1109/ICPP.2010.18
[6] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[7] Brunst H, 2001, LECT NOTES COMPUT SC, V2074, P751
[8] Burtscher Martin., 2010, P 2010 ACMIEEE INT C, P1, DOI [10.1109/SC.2010.41, DOI 10.1109/SC.2010.41]
[9] Cohen I, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P231
[10] Duan RB, 2009, CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, P339, DOI 10.1109/CCGRID.2009.58

← 1 2 3 →