PARMA: Parallelization-Aware Run-Time Management for Energy-Efficient Many-Core Systems

被引:8
作者
Al-hayanni, Mohammed A. Noaman [1 ,2 ]
Rafiev, Ashur [3 ,4 ]
Xia, Fei [4 ]
Shafik, Rishad [5 ]
Romanovsky, Alexander [3 ]
Yakovlev, Alex [1 ]
机构
[1] Newcastle Univ, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Univ Technol Baghdad, Dept Elect Engn, Baghdad 10001, Iraq
[3] Newcastle Univ, Sch Comp, Newcastle Upon Tyne, Tyne & Wear, England
[4] Newcastle Univ, Sch Engn, Newcastle Upon Tyne, Tyne & Wear, England
[5] Newcastle Univ, Elect Syst, Newcastle Upon Tyne, Tyne & Wear, England
基金
英国工程与自然科学研究理事会;
关键词
IP networks; Computational modeling; Hardware; System performance; Optimization; Measurement; Monitoring; Run-time management; many-core; speedup; power modelling; energy-delay-product; energy per instruction; POWER; PERFORMANCE; VOLTAGE;
D O I
10.1109/TC.2020.2975787
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance and energy efficiency considerations have shifted computing paradigms from single-core to many-core architectures. At the same time, traditional speedup models such as Amdahl's Law face challenges in the run-time reasoning for system performance and energy efficiency, because these models typically assume limited variations of the parallel fraction. Moreover, the parallel fraction, which varies dynamically in workloads, is generally unknown at run-time without application-level instrumentation. This article describes novel performance/energy trade-off models based on realistic architectural considerations, which describe the parallel fraction and speedup as functions of performance counter values available in modern processors, removing the need for application-level instrumentation. These are then used to develop a Parallelization-Aware Run-time Management (PARMA) approach. PARMA aims at controlling core allocations and operating voltage/frequency points for energy efficiency, according to the varying workload parallel fractions. The efficacy of our models and the PARMA approach is extensively validated using a number of PARSEC benchmark applications, involving two performance/energy trade-off metrics: energy-delay-product (EDP), typically used in high-performance applications and energy per instruction (EPI), suitable for energy-aware applications. Up to 48 and 68 percent improvements in EDP and EPI have been observed using the PARMA approach compared with parallelization-agnostic methods.
引用
收藏
页码:1507 / 1518
页数:12
相关论文
共 40 条
[1]  
Aalsaud A, 2018, INT WORKS POW TIM, P206, DOI 10.1109/PATMOS.2018.8464142
[2]   Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters [J].
Al-hayanni, Mohammed A. N. ;
Shafik, Rishad ;
Rafiev, Ashur ;
Xia, Fei ;
Yakovlev, Alex .
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, :410-417
[3]  
Amdahl G.M., 1967, SPRING JOINT COMP C
[4]   The PARSEC Benchmark Suite: Characterization and Architectural Implications [J].
Bienia, Christian ;
Kumar, Sanjeev ;
Singh, Jaswinder Pal ;
Li, Kai .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :72-81
[5]  
Buch D. K., 2005, U. S. Patent, Patent No. [6,901,522, 6901522]
[6]   MINIMIZING POWER-CONSUMPTION IN DIGITAL CMOS CIRCUITS [J].
CHANDRAKASAN, AP ;
BRODERSEN, RW .
PROCEEDINGS OF THE IEEE, 1995, 83 (04) :498-523
[7]  
Chen YH, 2016, DES AUT TEST EUROPE, P79
[8]  
Cormen T., 2009, INTRO ALGORITHM
[9]   A Methodology for Modeling Dynamic and Static Power Consumption for Multicore Processors [J].
Goel, Bhavishya ;
McKee, Sally A. .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :273-282
[10]   Design of Fast and Efficient Energy-Aware Gradient-Based Scheduling Algorithms for Heterogeneous Embedded Multiprocessor Systems [J].
Goh, Lee Kee ;
Veeravalli, Bharadwaj ;
Viswanathan, Sivakumar .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (01) :1-12