A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引：4

作者：

Benz, Thomas ^{[1
]}

Rogenmoser, Michael ^{[1
]}

Scheffler, Paul ^{[1
]}

Riedel, Samuel ^{[1
]}

Ottaviano, Alessandro ^{[1
]}

Kurth, Andreas ^{[1
]}

Hoefler, Torsten ^{[2
]}

Benini, Luca ^{[3
,4
]}

机构：

[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland

[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland

[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland

[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 01期

关键词：

DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;

D O I：

10.1109/TC.2023.3329930

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.

引用

页码：263 / 277

页数：15

共 50 条

[1] Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
Olgun, Ataberk
Bostanci, F. Nisa
de Oliveira Junior, Geraldo Francisco
Tugrul, Yahya Can
Ul Bera, Rah
Yaglikci, Abdullah Giray
Hassan, Hasan
Ergin, Oguz
Mutlu, Onur
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
[2] Galaxy: A High-Performance Energy-Efficient Multi-Chip Architecture Using Photonic Interconnects
Demir, Yigit
Pan, Yan
Song, Seukwoo
Hardavellas, Nikos
Kim, John
Memik, Gokhan
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 303 - 312
[3] A High-Performance and Energy-Efficient Ternary Multiplier Using CNTFETs
Abbasian, Erfan
Sofimowloodi, Sobhan
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (11) : 14365 - 14379
[4] A High-Performance and Energy-Efficient Ternary Multiplier Using CNTFETs
Erfan Abbasian
Sobhan Sofimowloodi
Arabian Journal for Science and Engineering, 2023, 48 : 14365 - 14379
[5] Zen: An Energy-Efficient High-Performance x86 Core
Singh, Teja
Schaefer, Alex
Rangarajan, Sundar
John, Deepesh
Henrion, Carson
Schreiber, Russell
Rodriguez, Miguel
Kosonocky, Stephen
Naffziger, Samuel
Novak, Amy
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 102 - 114
[6] High-performance energy-efficient D-flip-flop circuits
Ko, UM
Balsara, PT
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (01) : 94 - 98
[7] Ameba: A High-performance and Energy-efficient Online Video Retrieval System
Yang, Jin
Pang, Jianmin
Yu, Jintao
Cao, Wei
2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 200 - 203
[8] Fast Pipelined Storage for High-Performance Energy-Efficient Computing with Superconductor Technology
Dorojevets, Mikhail
Chen, Zuoting
2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
[9] Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems
Papagiannopoulou, Dimitra
Capodanno, Giuseppe
Moreshet, Tali
Herlihy, Maurice
Bahar, R. Iris
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (03)
[10] EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM
Koppula, Skanda
Orosa, Lois
Yaglikci, A. Giray
Azizi, Roknoddin
Shahroodi, Taha
Kanellopoulos, Konstantinos
Mutlu, Onur
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 166 - 181

← 1 2 3 4 5 →