Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs

被引:0
|
作者
Brilli, Gianluca [1 ]
Cavicchioli, Roberto [2 ]
Solieri, Marco [3 ]
Valente, Paolo [3 ]
Marongiu, Andrea [3 ]
机构
[1] Univ Modena & Reggio Emilia, Dept Ingn Enzo Ferrari, Via Pietro Vivarelli 10,Europe 41125, Modena, Italy
[2] Univ Modena & Reggio Emilia, Dept Sci & Methods Engn, Via Amendola 2, Reggio Emilia, Italy
[3] Univ Modena & Reggio Emilia, Dept Phys Informat & Math, Via Giuseppe Campi 213-b, I-91125 Modena, Italy
关键词
Heterogeneous systems-on-chip; memory interference; Predictable Execution; MULTICORE; PERFORMANCE; MODEL;
D O I
10.1145/3548773
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High-performance embedded platforms are increasingly adopting heterogeneous systems-on-chip (HeSoC) that couple multi-core CPUs with accelerators such as GPU, FPGA, or AI engines. Adopting HeSoCs in the context of real-time workloads is not immediately possible, though, as contention on shared resources like the memory hierarchy-and in particular the main memory (DRAM)-causes unpredictable latency increase. To tackle this problem, both the research community and certification authorities mandate (i) that accesses from parallel threads to the shared system resources (typically, main memory) happen in a mutually exclusive manner by design, or (ii) that per-thread bandwidth regulation is enforced. Such arbitration schemes provide timing guarantees, but make poor use of the memory bandwidth available in a modern HeSoC. Controlled Memory Request Injection (CMRI) is a recently-proposed bandwidth limitation concept that builds on top of a mutually-exclusive schedule but still allows the threads currently not entitled to access memory to use as much of the unused bandwidth as possible without losing the timing guarantee. CMRI has been discussed in the context of a multi-core CPU, but the same principle applies also to a more complex system such as an HeSoC. In this article, we introduce two CMRI schemes suitable for HeSoCs: Voluntary Throttling via code refactoring and Bandwidth Regulation via dynamic throttling. We extensively characterize a proof-of-concept incarnation of both schemes on two HeSoCs: an NVIDIA Tegra TX2 and a Xilinx UltraScale+, highlighting the benefits and the costs of CMRI for synthetic workloads that model worst-case DRAM access. We also test the effectiveness of CMRI with real benchmarks, studying the effect of interference among the host CPU and the accelerators.
引用
收藏
页数:25
相关论文
共 10 条
  • [1] HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCs
    Forsberg, Bjorn
    Benini, Luca
    Marongiu, Andrea
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (01) : 17 - 29
  • [2] Toward More Efficient Scan Data Bandwidth Utilization on Modern SOCs
    Dong, Yan
    Giles, Grady
    Li, GuoLiang
    Rearick, Jeff
    Schulze, John
    Wingfield, James
    Wood, Tim
    2016 29TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2016, : 64 - 68
  • [3] Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous SoCs
    Dagli, Ismet
    Belviranli, Mehmet E.
    PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 243 - 256
  • [4] Design of a Multichannel NAND Flash Memory Controller for Efficient Utilization of Bandwidth in SSDs
    Jose, Soya Treesa
    Pradeep, C.
    2013 IEEE INTERNATIONAL MULTI CONFERENCE ON AUTOMATION, COMPUTING, COMMUNICATION, CONTROL AND COMPRESSED SENSING (IMAC4S), 2013, : 235 - 239
  • [5] Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs
    Vogel, Pirmin
    Kurth, Andreas
    Weinbuch, Johannes
    Marongiu, Andrea
    Benini, Luca
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [6] OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization
    Sung, Hanul
    Min, Jeesoo
    Koo, Donghun
    Eom, Hyeonsang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (01): : 181 - 193
  • [7] OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization
    Hanul Sung
    Jeesoo Min
    Donghun Koo
    Hyeonsang Eom
    Cluster Computing, 2021, 24 : 181 - 193
  • [8] OMBM-ML: Efficient Memory Bandwidth Management for Ensuring QoS and Improving Server Utilization
    Min, Jeesoo
    Sung, Hanul
    Eom, Hyeonsang
    2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 72 - 78
  • [9] Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
    Kurth, Andreas
    Vogel, Pirmin
    Marongiu, Andrea
    Benini, Luca
    2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 292 - 300
  • [10] Thread assignment optimization with real-time performance and memory bandwidth guarantees for energy-efficient heterogeneous multi-core systems
    Petrucci, Vinicius
    Loques, Orlando
    Mosse, Daniel
    Melhem, Rami
    Abou Gazala, Neven
    Gobriel, Sameh
    2012 IEEE 18TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2012, : 263 - 272