Exploring Time-Predictable and High-Performance Last-Level Caches for Hard Real-Time Integrated CPU-GPU Processors

被引:0
作者
Wang X. [1 ]
Zhang W. [2 ]
机构
[1] Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA
[2] Department of Computer Science and Engineering, University of Louisville, Louisville, KY
基金
美国国家科学基金会;
关键词
Cache locking; Cache partitioning; GPU; Integrated CPU-GPU; Real-time systems;
D O I
10.5626/jcse.2020.14.3.89
中图分类号
学科分类号
摘要
Time predictability is crucial for hard real-time and safety-critical systems. In an integrated CPU-GPU (graphic processing units) architecture, the shared last-level cache (LLC) can cause a large number of interferences between CPU and GPU LLC accesses with diverse patterns and characteristics, which can significantly impact the performance and time predictability of both CPUs and GPUs. In this paper, we explore cache partitioning, locking, and a combination of them to make the LLC time-predictable for integrated CPU-GPUs while achieving high performance. By evaluating these LLC management approaches, we can provide real-time system developers recommendations on the most effective time-predictable LLC designs for heterogeneous CPU-GPU multicore processors. © 2020 xx
引用
收藏
页码:89 / 101
页数:12
相关论文
共 35 条
[11]  
Qureshi M. K., Patt Y. N., Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches, Proceedings of 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 423-432, (2006)
[12]  
Moreto M., Cazorla F. J., Ramirez A., Valero M., MLPaware dynamic cache partitioning, High-Performance Embedded Architectures and Compilers, pp. 337-352, (2008)
[13]  
Kedar G., Mendelson A., Cidon I., SPACE: semipartitioned cache for energy efficient, hard real-time systems, IEEE Transactions on Computers, 66, 4, pp. 717-730, (2017)
[14]  
Lee J., Kim H., TAP: a TLP-aware cache management policy for a CPU-GPU heterogeneous architecture, Proceedings of IEEE International Symposium on High-Performance Comp Architecture, pp. 1-12, (2012)
[15]  
Mekkat V., Holey A., Yew P. C., Zhai A., Managing shared last-level cache in a heterogeneous multicore processor, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 225-234, (2013)
[16]  
Woo D. H., Lee H. H. S., COMPASS: a programmable data prefetcher using idle GPU shaders, ACM SIGPLAN Notices, 45, 3, pp. 297-310, (2010)
[17]  
Yang Y., Xiang P., Mantor M., Zhou H., CPU-assisted GPGPU on fused CPU-GPU architectures, Proceedings of IEEE International Symposium on High-Performance Comp Architecture, pp. 1-12, (2012)
[18]  
Wang P. H., Li C. H., Yang C. L., Latency sensitivitybased cache partitioning for heterogeneous multi-core architecture, Proceedings of the 53rd Annual Design Automation Conference, pp. 1-6, (2016)
[19]  
Qiu K., Zhao M., Xue C. J., Orailoglu A., Branch prediction-directed dynamic instruction cache locking for embedded systems, ACM Transactions on Embedded Computing Systems (TECS), 13, 5s, (2014)
[20]  
Adegbija T., Gordon-Ross A., Phase-based cache locking for embedded systems, Proceedings of the 25th Edition on Great Lakes Symposium on VLSI, pp. 115-120, (2015)