Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs

被引：0

作者：

Brilli, Gianluca ^{[1
]}

Cavicchioli, Roberto ^{[2
]}

Solieri, Marco ^{[3
]}

Valente, Paolo ^{[3
]}

Marongiu, Andrea ^{[3
]}

机构：

[1] Univ Modena & Reggio Emilia, Dept Ingn Enzo Ferrari, Via Pietro Vivarelli 10,Europe 41125, Modena, Italy

[2] Univ Modena & Reggio Emilia, Dept Sci & Methods Engn, Via Amendola 2, Reggio Emilia, Italy

[3] Univ Modena & Reggio Emilia, Dept Phys Informat & Math, Via Giuseppe Campi 213-b, I-91125 Modena, Italy

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 01期

关键词：

Heterogeneous systems-on-chip; memory interference; Predictable Execution; MULTICORE; PERFORMANCE; MODEL;

D O I：

10.1145/3548773

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

High-performance embedded platforms are increasingly adopting heterogeneous systems-on-chip (HeSoC) that couple multi-core CPUs with accelerators such as GPU, FPGA, or AI engines. Adopting HeSoCs in the context of real-time workloads is not immediately possible, though, as contention on shared resources like the memory hierarchy-and in particular the main memory (DRAM)-causes unpredictable latency increase. To tackle this problem, both the research community and certification authorities mandate (i) that accesses from parallel threads to the shared system resources (typically, main memory) happen in a mutually exclusive manner by design, or (ii) that per-thread bandwidth regulation is enforced. Such arbitration schemes provide timing guarantees, but make poor use of the memory bandwidth available in a modern HeSoC. Controlled Memory Request Injection (CMRI) is a recently-proposed bandwidth limitation concept that builds on top of a mutually-exclusive schedule but still allows the threads currently not entitled to access memory to use as much of the unused bandwidth as possible without losing the timing guarantee. CMRI has been discussed in the context of a multi-core CPU, but the same principle applies also to a more complex system such as an HeSoC. In this article, we introduce two CMRI schemes suitable for HeSoCs: Voluntary Throttling via code refactoring and Bandwidth Regulation via dynamic throttling. We extensively characterize a proof-of-concept incarnation of both schemes on two HeSoCs: an NVIDIA Tegra TX2 and a Xilinx UltraScale+, highlighting the benefits and the costs of CMRI for synthetic workloads that model worst-case DRAM access. We also test the effectiveness of CMRI with real benchmarks, studying the effect of interference among the host CPU and the accelerators.

引用

页数：25

共 10 条

[1] HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCs
Forsberg, Bjorn
Benini, Luca
Marongiu, Andrea
IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (01) : 17 - 29
[2] Toward More Efficient Scan Data Bandwidth Utilization on Modern SOCs
Dong, Yan
Giles, Grady
Li, GuoLiang
Rearick, Jeff
Schulze, John
Wingfield, James
Wood, Tim
2016 29TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2016, : 64 - 68
[3] Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous SoCs
Dagli, Ismet
Belviranli, Mehmet E.
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 243 - 256
[4] Design of a Multichannel NAND Flash Memory Controller for Efficient Utilization of Bandwidth in SSDs
Jose, Soya Treesa
Pradeep, C.
2013 IEEE INTERNATIONAL MULTI CONFERENCE ON AUTOMATION, COMPUTING, COMMUNICATION, CONTROL AND COMPRESSED SENSING (IMAC4S), 2013, : 235 - 239
[5] Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs
Vogel, Pirmin
Kurth, Andreas
Weinbuch, Johannes
Marongiu, Andrea
Benini, Luca
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
[6] OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization
Sung, Hanul
Min, Jeesoo
Koo, Donghun
Eom, Hyeonsang
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (01): : 181 - 193
[7] OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization
Hanul Sung
Jeesoo Min
Donghun Koo
Hyeonsang Eom
Cluster Computing, 2021, 24 : 181 - 193
[8] OMBM-ML: Efficient Memory Bandwidth Management for Ensuring QoS and Improving Server Utilization
Min, Jeesoo
Sung, Hanul
Eom, Hyeonsang
2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 72 - 78
[9] Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
Kurth, Andreas
Vogel, Pirmin
Marongiu, Andrea
Benini, Luca
2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 292 - 300
[10] Thread assignment optimization with real-time performance and memory bandwidth guarantees for energy-efficient heterogeneous multi-core systems
Petrucci, Vinicius
Loques, Orlando
Mosse, Daniel
Melhem, Rami
Abou Gazala, Neven
Gobriel, Sameh
2012 IEEE 18TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2012, : 263 - 272

← 1 →