Paving theWay Toward Energy-Aware and Automated Datacentre

被引:21
作者
Bartolini, Andrea [1 ]
Beneventi, Francesco [1 ]
Borghesi, Andrea [2 ]
Cesarini, Daniele [1 ]
Libri, Antonio [3 ]
Benini, Luca [3 ]
Cavazzoni, Carlo [4 ]
机构
[1] Univ Bologna, DEI, Bologna, Italy
[2] Univ Bologna, DISI, Bologna, Italy
[3] ETHZ Zurich, IIS, Zurich, Switzerland
[4] CINECA, SCAI, Casalecchio Di Reno, Italy
来源
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) | 2019年
关键词
HPC; Energy Efficiency; Quantum Espresso; Big Data; Anomaly Detection; Artificial Intelligence; Datacentre automation;
D O I
10.1145/3339186.3339215
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Energy efficiency and datacentre automation are critical targets of the research and deployment agenda of CINECA and its research partners in the Energy Efficient System Laboratory of the University of Bologna and the Integrated System Laboratory in ETH Zurich. In this manuscript, we present the primary outcomes of the research conducted in this domain and under the umbrella of several European, National and Private funding schemes. These outcomes consist of: (i) the ExaMon scalable, flexible, holistic monitoring framework, which is capable of ingesting 70GB/day of telemetry data of the entire CINECA datacentre and link this data with machine learning and artificial intelligence techniques and tools. (ii) The exploitation of ExaMon to evaluates the viability of machine-learning based job scheduling, power prediction and deep-learning based anomaly detection of compute nodes. (iii) The viability of scalable, out-of-band and high-frequency power monitoring in compute nodes, by leveraging low cost and open source embedded hardware and edge-computing, namely DiG. (iv) Finally, the viability of run time library to exploit communication regions in large-scale application to reduce the energy consumption without impairing the execution time, namely COUNTDOWN.
引用
收藏
页数:8
相关论文
共 29 条
[1]  
Ahmad I, 2017, 2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), P3096, DOI 10.1109/ICPCSI.2017.8392295
[2]  
[Anonymous], 2016, Deep learning. vol
[3]  
[Anonymous], 2018, ARXIV181105269
[4]  
Auweter Axel, 2014, Supercomputing. 29th International Conference, ISC 2014. Proceedings: LNCS 8488, P394, DOI 10.1007/978-3-319-07518-1_25
[5]  
BAILEY D.H., 2011, NAS Parallel Benchmarks, P1254, DOI 10.1007/978-0-387-09766-4133
[6]   The DAVIDE Big-Data-Powered Fine-Grain Power and Performance Monitoring Support [J].
Bartolini, Andrea ;
Borghesi, Andrea ;
Libri, Antonio ;
Beneventi, Francesco ;
Gregori, Daniele ;
Tinti, Simone ;
Gianfreda, Cosimo ;
Altoe, Piero .
2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, :303-308
[7]  
Bartolini A, 2014, LECT NOTES COMPUT SC, V8656, P765, DOI 10.1007/978-3-319-10428-7_55
[8]  
Beneventi F, 2017, DES AUT TEST EUROPE, P1038, DOI 10.23919/DATE.2017.7927143
[9]   A survey of design techniques for system-level dynamic power management [J].
Benini, L ;
Bogliolo, A ;
De Micheli, G .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (03) :299-316
[10]  
Borghesi A., INT J HIGH PERFORMAN, V0