PATHA: Performance Analysis Tool for HPC Applications

被引:0
作者
Yoo, Wucherl [1 ]
Koo, Michelle [2 ]
Cao, Yi [3 ]
Sim, Alex [1 ]
Nugent, Peter [1 ,2 ]
Wu, Kesheng [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] CALTECH, Pasadena, CA 91125 USA
来源
2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC) | 2015年
关键词
Performance analysis; Performance evaluation; High performance computing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large science projects rely on complex workflows to analyze terabytes or petabytes of data. These jobs are often running over thousands of CPU cores and simultaneously performing data accesses, data movements, and computation. It is difficult to identify bottlenecks or to debug the performance issues in these large workflows. To address these challenges, we have developed Performance Analysis Tool for HPC Applications (PATHA) using the state-of-art open source big data processing tools. Our framework can ingest system logs to extract key performance measures, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of PATHA, we conduct a case study on the workflows from an astronomy project known as the Palomar Transient Factory (PTF). Our study processed 1.6 TB of system logs collected on the NERSC supercomputer Edison. Using PATHA, we were able to identify performance bottlenecks, which reside in three tasks of PTF workflow with the dependency on the density of celestial objects.
引用
收藏
页数:8
相关论文
共 26 条
  • [11] Hey T., 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery
  • [12] Jolliffe, 2014, WILEY STATSREF STAT
  • [13] Junwei Cao, 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086), P485, DOI 10.1109/PCCC.2000.830354
  • [14] The Palomar Transient Factory: System Overview, Performance, and First Results
    Law, Nicholas M.
    Kulkarni, Shrinivas R.
    Dekany, Richard G.
    Ofek, Eran O.
    Quimby, Robert M.
    Nugent, Peter E.
    Surace, Jason
    Grillmair, Carl C.
    Bloom, Joshua S.
    Kasliwal, Mansi M.
    Bildsten, Lars
    Brown, Tim
    Cenko, S. Bradley
    Ciardi, David
    Croner, Ernest
    Djorgovski, S. George
    van Eyken, Julian
    Filippenko, Alexei V.
    Fox, Derek B.
    Gal-Yam, Avishay
    Hale, David
    Hamam, Nouhad
    Helou, George
    Henning, John
    Howell, D. Andrew
    Jacobsen, Janet
    Laher, Russ
    Mattingly, Sean
    McKenna, Dan
    Pickles, Andrew
    Poznanski, Dovi
    Rahmer, Gustavo
    Rau, Arne
    Rosing, Wayne
    Shara, Michael
    Smith, Roger
    Starr, Dan
    Sullivan, Mark
    Velur, Viswa
    Walters, Richard
    Zolkower, Jeff
    [J]. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2009, 121 (886) : 1395 - 1408
  • [15] Scientific workflow management and the Kepler system
    Ludascher, Bertram
    Altintas, Ilkay
    Berkley, Chad
    Higgins, Dan
    Jaeger, Efrat
    Jones, Matthew
    Lee, Edward A.
    Tao, Jing
    Zhao, Yang
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2006, 18 (10) : 1039 - 1065
  • [16] Matsunaga Andrea., 2010, 2010 10 IEEEACM INT, P495, DOI 10.1109/CCGRID.2010.98
  • [17] Rusu Florin, 2014, Databases in Networked Information Systems. 9th International Workshop, DNIS 2014. Proceedings: LNCS 8381, P53, DOI 10.1007/978-3-319-05693-7_4
  • [18] The TAU parallel performance system
    Shende, Sameer S.
    Malony, Allen D.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (02) : 287 - 311
  • [19] Shoshani A., 2010, Scientific Data Management Challenges, Technology, and Deployment
  • [20] Thereska E, 2008, PERF E R SI, V36, P253, DOI 10.1145/1384529.1375486