A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [1] Energy-Efficient and High-Performance Software Architecture for Storage Class Memory
    Baek, Seungjae
    Choi, Jongmoo
    Lee, Donghee
    Noh, Sam H.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 12 (03)
  • [2] High-performance, energy-efficient IGBTs
    Snyder, Lucy A.
    Electron Prod Garden City NY, 2008, 8
  • [3] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
  • [4] Energy-Efficient and High-Performance Data Converters
    Goes, Joao
    2024 31ST INTERNATIONAL CONFERENCE ON MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEM, MIXDES 2024, 2024, : 15 - 15
  • [5] Encodings for high-performance energy-efficient signaling
    Bogliolo, A
    ISLPED'01: PROCEEDINGS OF THE 2001 INTERNATIONAL SYMPOSIUM ON LOWPOWER ELECTRONICS AND DESIGN, 2001, : 170 - 175
  • [6] Energy-efficient high-performance storage system
    Wang, Jun
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2640 - 2644
  • [7] Constructing a high-performance, energy-efficient cleanroom
    Patel, Bill
    Greiner, Jerry
    Huffman, Tom R.
    Microcontamination, 1991, 9 (02): : 29 - 32
  • [8] Energy-Efficient and High-Performance NoC Architecture and Mapping Solution for Deep Neural Networks
    Reza, Md Farhadur
    Ampadu, Paul
    PROCEEDINGS OF THE 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS'19), 2019,
  • [9] DRAMA: An Approximate DRAM Architecture for High-performance and Energy-efficient Deep Training System
    Duy-Thanh Nguyen
    Min, Chang-Hong
    Nhut-Minh Ho
    Chang, Ik-Joon
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [10] Energy-efficient high-performance parallel and distributed computing
    Khan, Samee Ullah
    Bouvry, Pascal
    Engel, Thomas
    JOURNAL OF SUPERCOMPUTING, 2012, 60 (02): : 163 - 164