TAUoverSupermon; Low-overhead Online parallel performance monitoring

被引:0
|
作者
Nataraj, Aroon [1 ]
Sottile, Matthew [2 ]
Morris, Alan [1 ]
Malony, Allen D. [1 ]
Shende, Sameer [1 ]
机构
[1] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
来源
EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS | 2007年 / 4641卷
关键词
online performance measurement; cluster monitoring;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems - TAU and Supermon - to address this problem. TAU performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very low-overhead application monitoring as well as other benefits unavailable from using a transport such as NFS.
引用
收藏
页码:85 / +
页数:3
相关论文
共 50 条
  • [1] Distop: A low-overhead cluster monitoring system
    Andresen, D
    Schopf, N
    Bowker, E
    Bower, T
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 1832 - 1836
  • [2] A Low-overhead PUF based on Parallel Scan Design
    Wang, Wenxuan
    Cui, Aijiao
    Qu, Gang
    Li, Huawei
    2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 715 - 720
  • [3] Online Feature Selection for Low-overhead Learning in Networked Systems
    Wang, Xiaoxuan
    Samani, Forough Shahab
    Johnsson, Andreas
    Stadler, Rolf
    PROCEEDINGS OF THE 2021 17TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2021): SMART MANAGEMENT FOR FUTURE NETWORKS AND SERVICES, 2021, : 527 - 529
  • [4] Towards Low-Overhead Resilience for Data Parallel Deep Learning
    Nicolae, Bogdan
    Hobson, Tanner
    Yildiz, Orcun
    Peterka, Tom
    Morozov, Dmitry
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 336 - 345
  • [5] A low-overhead Monitoring Ring Interconnect for MPSoC Parameter Optimization
    Bouajila, Abdelmajid
    Lakhtel, Abdallah
    Zeppenfeld, Johannes
    Stechele, Walter
    Herkersdorf, Andreas
    2012 IEEE 15TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS (DDECS), 2012, : 46 - 49
  • [6] Opimon: A Transparent, Low-Overhead Monitoring System for OpenFlow Networks
    Watanakeesuntorn, Wassapon
    Takahashi, Keichi
    Nakasan, Chawanat
    Ichikawa, Kohei
    Iida, Hajimu
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2022, E105B (04) : 485 - 493
  • [7] Opimon: A Transparent, Low-Overhead Monitoring System for OpenFlow Networks
    Watanakeesuntorn, Wassapon
    Takahashi, Keichi
    Nakasan, Chawanat
    Ichikawa, Kohei
    Iida, Hajimu
    Operations Research, 2022, 70 (02) : 485 - 493
  • [8] Low-Overhead Clustered Federated Learning for Personalized Stress Monitoring
    Jiang, Shiyi
    Firouzi, Farshad
    Chakrabarty, Krishnendu
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03) : 4335 - 4347
  • [9] Hashing ATD Tags for Low-Overhead Safe Contention Monitoring
    Andreu, Pablo
    Lopez, Pedro
    Hernandez, Carles
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 166 - 169
  • [10] Catamaran: Low-Overhead Memory Safety Enforcement via Parallel Acceleration
    Zhang, Yiyu
    Liu, Tianyi
    Sun, Zewen
    Chen, Zhe
    Li, Xuandong
    Zuo, Zhiqiang
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 816 - 828