A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [1] Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
    Olgun, Ataberk
    Bostanci, F. Nisa
    de Oliveira Junior, Geraldo Francisco
    Tugrul, Yahya Can
    Ul Bera, Rah
    Yaglikci, Abdullah Giray
    Hassan, Hasan
    Ergin, Oguz
    Mutlu, Onur
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
  • [2] Galaxy: A High-Performance Energy-Efficient Multi-Chip Architecture Using Photonic Interconnects
    Demir, Yigit
    Pan, Yan
    Song, Seukwoo
    Hardavellas, Nikos
    Kim, John
    Memik, Gokhan
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 303 - 312
  • [3] A High-Performance and Energy-Efficient Ternary Multiplier Using CNTFETs
    Abbasian, Erfan
    Sofimowloodi, Sobhan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (11) : 14365 - 14379
  • [4] A High-Performance and Energy-Efficient Ternary Multiplier Using CNTFETs
    Erfan Abbasian
    Sobhan Sofimowloodi
    Arabian Journal for Science and Engineering, 2023, 48 : 14365 - 14379
  • [5] Zen: An Energy-Efficient High-Performance x86 Core
    Singh, Teja
    Schaefer, Alex
    Rangarajan, Sundar
    John, Deepesh
    Henrion, Carson
    Schreiber, Russell
    Rodriguez, Miguel
    Kosonocky, Stephen
    Naffziger, Samuel
    Novak, Amy
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 102 - 114
  • [6] High-performance energy-efficient D-flip-flop circuits
    Ko, UM
    Balsara, PT
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (01) : 94 - 98
  • [7] Ameba: A High-performance and Energy-efficient Online Video Retrieval System
    Yang, Jin
    Pang, Jianmin
    Yu, Jintao
    Cao, Wei
    2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 200 - 203
  • [8] Fast Pipelined Storage for High-Performance Energy-Efficient Computing with Superconductor Technology
    Dorojevets, Mikhail
    Chen, Zuoting
    2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
  • [9] Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems
    Papagiannopoulou, Dimitra
    Capodanno, Giuseppe
    Moreshet, Tali
    Herlihy, Maurice
    Bahar, R. Iris
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2015, 14 (03)
  • [10] EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM
    Koppula, Skanda
    Orosa, Lois
    Yaglikci, A. Giray
    Azizi, Roknoddin
    Shahroodi, Taha
    Kanellopoulos, Konstantinos
    Mutlu, Onur
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 166 - 181