PATHA: Performance Analysis Tool for HPC Applications

被引：0

作者：

Yoo, Wucherl ^{[1
]}

Koo, Michelle ^{[2
]}

Cao, Yi ^{[3
]}

Sim, Alex ^{[1
]}

Nugent, Peter ^{[1
,2
]}

Wu, Kesheng ^{[1
]}

机构：

[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

[3] CALTECH, Pasadena, CA 91125 USA

来源：

2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC) | 2015年

关键词：

Performance analysis; Performance evaluation; High performance computing;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. To address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). Our study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.

引用

页数：8

共 26 条

[11] Hey T., 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery
[12] Jolliffe, 2014, WILEY STATSREF STAT
[13] Junwei Cao, 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086), P485, DOI 10.1109/PCCC.2000.830354
[14] The Palomar Transient Factory: System Overview, Performance, and First Results
Law, Nicholas M.
Kulkarni, Shrinivas R.
Dekany, Richard G.
Ofek, Eran O.
Quimby, Robert M.
Nugent, Peter E.
Surace, Jason
Grillmair, Carl C.
Bloom, Joshua S.
Kasliwal, Mansi M.
Bildsten, Lars
Brown, Tim
Cenko, S. Bradley
Ciardi, David
Croner, Ernest
Djorgovski, S. George
van Eyken, Julian
Filippenko, Alexei V.
Fox, Derek B.
Gal-Yam, Avishay
Hale, David
Hamam, Nouhad
Helou, George
Henning, John
Howell, D. Andrew
Jacobsen, Janet
Laher, Russ
Mattingly, Sean
McKenna, Dan
Pickles, Andrew
Poznanski, Dovi
Rahmer, Gustavo
Rau, Arne
Rosing, Wayne
Shara, Michael
Smith, Roger
Starr, Dan
Sullivan, Mark
Velur, Viswa
Walters, Richard
Zolkower, Jeff
[J]. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2009, 121 (886) : 1395 - 1408
[15] Scientific workflow management and the Kepler system
Ludascher, Bertram
Altintas, Ilkay
Berkley, Chad
Higgins, Dan
Jaeger, Efrat
Jones, Matthew
Lee, Edward A.
Tao, Jing
Zhao, Yang
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2006, 18 (10) : 1039 - 1065
[16] Matsunaga Andrea., 2010, 2010 10 IEEEACM INT, P495, DOI 10.1109/CCGRID.2010.98
[17] Rusu Florin, 2014, Databases in Networked Information Systems. 9th International Workshop, DNIS 2014. Proceedings: LNCS 8381, P53, DOI 10.1007/978-3-319-05693-7_4
[18] The TAU parallel performance system
Shende, Sameer S.
Malony, Allen D.
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (02) : 287 - 311
[19] Shoshani A., 2010, Scientific Data Management Challenges, Technology, and Deployment
[20] Thereska E, 2008, PERF E R SI, V36, P253, DOI 10.1145/1384529.1375486

← 1 2 3 →