High Performance Cluster Monitoring System

被引:0
作者
Jiang, Xunfei [1 ]
Baigalmaa, Tuguldur [1 ]
Lam Nguyen [1 ]
Akiyoshi, Daiki [1 ]
Ramthun, Eli [1 ]
Parajuli, Niraj [1 ]
Peck, Charles [1 ]
机构
[1] Earlham Coll, Richmond, IN 47374 USA
来源
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2018年
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
System monitoring is an important basis for system modelling and improvement. For achieving higher efficiency in performance and lower energy consumption in cluster systems, we design a monitoring system that tracks the performance and temperature of clusters for education and research purposes. The total energy consumption of clusters can be estimated by using the performance and temperature data. Specifically, in our proposed system, all real-time data (including temperature and activities of components of computing nodes) is collected and stored in Round-Robin Databases (RRDs). These data can be visualized or downloaded through a friendly user interface for further analysis. Moreover, our system also provides users with a powerful runtime comparison feature, which allows users to compare the performance of a running experiment with historical experimental results without waiting for the completion of experiments. The data visualization and user interfaces in the monitoring system are demonstrated by using an experiment on our cluster system.
引用
收藏
页码:1188 / 1193
页数:6
相关论文
共 11 条
[1]   Eco-Storage: A Hybrid Storage System with Energy-Efficient Informed Prefetching [J].
Al Assaf, Maen M. ;
Jiang, Xunfei ;
Riduan Abid, Mohamed ;
Qin, Xiao .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2013, 72 (03) :165-180
[2]  
[Anonymous], 2011, IEEE T VISUALIZATION
[3]   Web search for a planet:: The Google cluster architecture [J].
Barroso, LA ;
Dean, J ;
Hölzle, U .
IEEE MICRO, 2003, 23 (02) :22-28
[4]  
Chavan A., 2015, IEEE Transactions on Parallel and Distributed Computing
[5]  
Gomez-Iglesias A., P XSEDE16 C DIV BIG
[6]  
Jiang X.-F., 2015, HDB DATA CTR, P915
[7]   HPC Cluster Monitoring System Architecture Design and Implement [J].
Li, Min ;
Zhang, Yisheng .
ICICTA: 2009 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL II, PROCEEDINGS, 2009, :325-327
[8]   Monitoring High Performance Computing Systems for the End User [J].
Moore, Christopher Lee ;
Khalsa, Prabhu Singh ;
Yilk, Todd Alan ;
Mason, Michael .
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, :714-716
[9]   LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses [J].
Roehl, Thomas ;
Eitzinger, Jan ;
Hager, Georg ;
Wellein, Gerhard .
2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, :781-784
[10]  
U. E. P. Agency, 2007, TECHNICAL REPORT