TAUoverSupermon; Low-overhead Online parallel performance monitoring

被引:0
作者
Nataraj, Aroon [1 ]
Sottile, Matthew [2 ]
Morris, Alan [1 ]
Malony, Allen D. [1 ]
Shende, Sameer [1 ]
机构
[1] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
来源
EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS | 2007年 / 4641卷
关键词
online performance measurement; cluster monitoring;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems - TAU and Supermon - to address this problem. TAU performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very low-overhead application monitoring as well as other benefits unavailable from using a transport such as NFS.
引用
收藏
页码:85 / +
页数:3
相关论文
empty
未找到相关数据