Exploring Time-Predictable and High-Performance Last-Level Caches for Hard Real-Time Integrated CPU-GPU Processors

被引:0
作者
Wang X. [1 ]
Zhang W. [2 ]
机构
[1] Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA
[2] Department of Computer Science and Engineering, University of Louisville, Louisville, KY
基金
美国国家科学基金会;
关键词
Cache locking; Cache partitioning; GPU; Integrated CPU-GPU; Real-time systems;
D O I
10.5626/jcse.2020.14.3.89
中图分类号
学科分类号
摘要
Time predictability is crucial for hard real-time and safety-critical systems. In an integrated CPU-GPU (graphic processing units) architecture, the shared last-level cache (LLC) can cause a large number of interferences between CPU and GPU LLC accesses with diverse patterns and characteristics, which can significantly impact the performance and time predictability of both CPUs and GPUs. In this paper, we explore cache partitioning, locking, and a combination of them to make the LLC time-predictable for integrated CPU-GPUs while achieving high performance. By evaluating these LLC management approaches, we can provide real-time system developers recommendations on the most effective time-predictable LLC designs for heterogeneous CPU-GPU multicore processors. © 2020 xx
引用
收藏
页码:89 / 101
页数:12
相关论文
共 35 条
[1]  
Elliott G. A., Ward B. C., Anderson J. H., GPUSync: a framework for real-time GPU management, Proceedings of 2013 IEEE 34th Real-Time Systems Symposium, pp. 33-44, (2013)
[2]  
Thiele L., Wilhelm R., Design for timing predictability, Real-Time Systems, 28, 2-3, pp. 157-177, (2004)
[3]  
Berg C., Engblom J., Wilhelm R., Requirements for and design of a processor with predictable timing, Perspectives Workshop: Design of Systems with Predictable Behaviour, (2004)
[4]  
Suhendra V., Mitra T., Exploring locking & partitioning for predictable shared caches on multi-cores, Proceedings of the 45th Annual Design Automation Conference, pp. 300-303, (2008)
[5]  
Paolieri M., Quinones E., Cazorla F. J., Bernat G., Valero M., Hardware support for WCET analysis of hard real-time multicore systems, ACM SIGARCH Computer Architecture News, 37, 3, pp. 57-68, (2009)
[6]  
Healy C. A., Arnold R. D., Mueller F., Whalley D. B., Harmon M. G., Bounding pipeline and instruction cache performance, IEEE Transactions on Computers, 48, 1, pp. 53-70, (1999)
[7]  
Li Y. T., Malik S., Wolfe A., Cache modeling for real-time software: beyond direct mapped instruction caches, Proceedings of the 17th IEEE Real-Time Systems Symposium, pp. 254-263, (1996)
[8]  
Liang Y., Mitra T., Instruction cache locking using temporal reuse profile, Proceedings of the 47th Design Automation Conference, pp. 344-349, (2010)
[9]  
Suh G. E., Rudolph L., Devadas S., Dynamic partitioning of shared cache memory, The Journal of Supercomputing, 28, 1, pp. 7-26, (2004)
[10]  
Kim S., Chandra D., Solihin Y., Fair cache sharing and partitioning in a chip multiprocessor architecture, Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 111-122, (2004)