ObsCon: Integrated Monitoring and Control for Parallel, Real-time Applications

被引:0
作者
Nussbaum, Alan [1 ]
Choodamani, Shwetha Mathangi Chandra [2 ]
Schwan, Karsten [2 ]
机构
[1] Georgia Inst Technol, Coll Comp, Georgia Tech Res Inst, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
Application Tuning; Monitoring; Closed-loop control; Middleware; Real-time Systems; Distributed Architectures;
D O I
10.1109/CLUSTER.2015.72
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A large class of emerging compute-intensive applications demand real-time or near real-time processing guarantees on streaming data. Sensor processing in particular, has stringent latency requirements for carrying out its digital processing for rapidly incoming radar data streams. The consequent demands on the cluster middleware used to run such codes include (i) efficient online observation of current application performance, coupled with (ii) highly responsive controllers able to dynamically adjust the application's input- and data-dependent runtime behavior. We present the Obs(erver)Con(troller) software for online monitoring and control, which based on specifications of acceptable application states and tunable knobs within the execution environment, ensures that application performance falls within acceptable limits. ObsCon topologies are dynamic, making possible the runtime association of ObsCon methods with arbitrary DAG-structured, distributed/parallel stream processing applications running on high end cluster machines. This paper describes the ObsCon software and its 'grey box' use with a high performance cluster code that exports to ObsCon select `hooks' for online monitoring and control Adaptive Digital Beamforming for a phase-array radar system.
引用
收藏
页码:474 / 477
页数:4
相关论文
共 18 条
[11]  
Gokhale A., 1996, Computer Communication Review, V26, P306, DOI 10.1145/248157.248183
[12]  
Hu Liting., 2014, P 2014 USENIX C USEN, P25
[13]   Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks [J].
Liu, Qing ;
Logan, Jeremy ;
Tian, Yuan ;
Abbasi, Hasan ;
Podhorszki, Norbert ;
Choi, Jong Youl ;
Klasky, Scott ;
Tchoua, Roselyne ;
Lofstead, Jay ;
Oldfield, Ron ;
Parashar, Manish ;
Samatova, Nagiza ;
Schwan, Karsten ;
Shoshani, Arie ;
Wolf, Matthew ;
Wu, Kesheng ;
Yu, Weikuan .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (07) :1453-1473
[14]   The ganglia distributed monitoring system: design, implementation, and experience [J].
Massie, ML ;
Chun, BN ;
Culler, DE .
PARALLEL COMPUTING, 2004, 30 (07) :817-840
[15]   The TAU parallel performance system [J].
Shende, Sameer S. ;
Malony, Allen D. .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (02) :287-311
[16]  
Tamches Ariel, 2001, THESIS
[17]  
Wang K., 2011, PROC 8 ACM INT C AUT, P141
[18]  
Zaharia Matei, 2012, P 9 USENIX C NETW SY