Facilitating HPC operation and administration via cloud

被引:2
作者
Sha C. [1 ,2 ]
Zhang J. [1 ]
An L. [1 ]
Zhang Y. [1 ]
Wang Z. [3 ]
Ilijaš T. [4 ]
Bat N. [4 ]
Verlič M. [4 ]
Ji Q. [1 ]
机构
[1] Sugon Information Industry Co, Beijing
[2] School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing
[3] High School Affiliated to Renmin University, Beijing
[4] Arctur računalniški inženiring d.o.o., Nova Gorica
来源
Supercomputing Frontiers and Innovations | 2019年 / 6卷 / 01期
关键词
Administration; Cloud; EasyOP; HPC; Monitoring; Notifications; Operation; Supercomputer;
D O I
10.14529/jsfi190105
中图分类号
学科分类号
摘要
Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on exible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Comput- ing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) man- agement, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users. © The Authors 2019.
引用
收藏
页码:23 / 35
页数:12
相关论文
共 31 条
[21]  
Massie M.L., Chun B.N., Culler D.E., The ganglia distributed monitoring system: design, implementation, and experience, Parallel Computing, 30, 7, pp. 817-840, (2004)
[22]  
Mell P.M., Grance T., SP 800-145 The NIST Definition of Cloud Computing, National Institute of Standards & Technology, (2011)
[23]  
Navaux P.O.A., Carissimi A., Roloff E., Diener M., High performance computing in the cloud: Deployment, performance and cost efficiency, In: IEEE International Conference on Cloud Computing Technology and Science, pp. 371-378, (2012)
[24]  
Ni G., Jie M., Bo L., Gridview: A dynamic and visual grid monitoring system, In: High Performance Computing and Grid in Asia Pacific Region, Seventh International Conference, pp. 89-92, (2004)
[25]  
Oetiker T., MRTG: The multi router traffic grapher, In: Conference on Systems Administration, pp. 141-148, (1998)
[26]  
Palmer J.T., Gallo S.M., Furlani T.R., Jones M.D., Deleon R.L., White J.P., Simakov N., Patra A.K., Sperhac J., Yearke T., Open XDMoD: A tool for the comprehensive management of high-performance computing resources, Computing in Science & Engineering, 17, 4, pp. 52-62, (2015)
[27]  
IT maintainance platform won 60 M investment, Information Technology and Informatization, 1, 12, pp. 8-8, (2015)
[28]  
Sadooghi I., Martin J.H., Li T., Brandstatter K., Zhao Y., Maheshwari K., Ruivo T.P.P.D.L., Timm S., Garzoglio G., Raicu I., Understanding the performance and potential of cloud computing for scientific applications, IEEE Transactions on Cloud Computing, 1, 1, (2015)
[29]  
Sheng Q.Z., Qiao X., Vasilakos A.V., Szabo C., Bourne S., Xu X., Web services composition: A decade's overview, Information Sciences, 280, pp. 218-238, (2014)
[30]  
Wibisono A., Suhartanto H., Cloud computing model and implementation of molecular dynamics simulation using Amber and Gromacs, In: International Conference on Advanced Computer Science and Information Systems, pp. 31-36, (2012)