Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization

被引:5
作者
Bhardwaj, Kshitij [1 ]
Havasi, Marton [2 ]
Yao, Yuan [1 ]
Brooks, David M. [1 ]
Hernandez Lobato, Jose Miguel [2 ]
Wei, Gu-Yeon [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Univ Cambridge, Cambridge CB2 1TW, England
基金
英国工程与自然科学研究理事会;
关键词
System-on-chip (SoC); hardware accelerators; coherence protocols; Bayesian optimization;
D O I
10.1109/LCA.2019.2910521
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modern system-on-chip (SoC) of the current exascale computing era is complex. These SoCs not only consist of several general-purpose processing cores but also integrate many specialized hardware accelerators. Three common coherency interfaces are used to integrate the accelerators with the memory hierarchy: non-coherent, coherent with the last-level cache (LLC), and fully-coherent. However, using a single coherence interface for all the accelerators in an SoC can lead to significant overheads: in the non-coherent model, accelerators directly access the main memory, which can have considerable performance penalty; whereas in the LLC-coherent model, the accelerators access the LLC but may suffer from performance bottleneck due to contention between several accelerators; and the fully-coherent model, that relies on private caches, can incur non-trivial power/area overheads. Given the limitations of each of these interfaces, this paper proposes a novel performance-aware hybrid coherency interface, where different accelerators use different coherency models, decided at design time based on the target applications so as to optimize the overall system performance. A new Bayesian optimization based framework is also proposed to determine the optimal hybrid coherency interface, i.e., use machine learning to select the best coherency model for each of the accelerators in the SoC in terms of performance. For image processing and classification workloads, the proposed framework determined that a hybrid interface achieves up to 23 percent better performance compared to the other 'homogeneous' interfaces, where all the accelerators use a single coherency model.
引用
收藏
页码:119 / 123
页数:5
相关论文
共 13 条
[1]   Spandex: A Flexible Interface for Efficient Heterogeneous Coherence [J].
Alsop, Johnathan ;
Sinclair, Matthew D. ;
Adve, Sarita V. .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :261-274
[2]  
[Anonymous], P 52 ANN DES AUT C
[3]  
[Anonymous], 2017, P IEEE ACM INT S LOW
[4]  
Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718
[5]  
Giri D., 2018, P 12 IEEE ACM INT S
[6]   Accelerators and Coherence: An SoC Perspective [J].
Giri, Davide ;
Mantovani, Paolo ;
Carloni, Luca P. .
IEEE MICRO, 2018, 38 (06) :36-45
[7]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[8]  
Powell Andrew, 2015, 2015 International Conference on Reconfigurable Computing and FPGAs (ReConFig), P1, DOI 10.1109/ReConFig.2015.7393362
[9]  
Reagen B, 2014, I S WORKL CHAR PROC, P110, DOI 10.1109/IISWC.2014.6983050
[10]   Taking the Human Out of the Loop: A Review of Bayesian Optimization [J].
Shahriari, Bobak ;
Swersky, Kevin ;
Wang, Ziyu ;
Adams, Ryan P. ;
de Freitas, Nando .
PROCEEDINGS OF THE IEEE, 2016, 104 (01) :148-175