A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [21] A High-Performance and Energy-Efficient Ternary Multiplier Using CNTFETs
    Erfan Abbasian
    Sobhan Sofimowloodi
    Arabian Journal for Science and Engineering, 2023, 48 : 14365 - 14379
  • [22] Energy-Efficient Encoding for High-Performance Buses with Staggered Repeaters
    Jayaprakash, Sharath
    Mahapatra, Nihar R.
    2009 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2009, : 252 - 257
  • [23] Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics
    Ham, Tae Jun
    Wu, Lisa
    Sundaram, Narayanan
    Satish, Nadathur
    Martonosi, Margaret
    2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2016,
  • [24] Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany
    Olofsson, Andreas
    Nordstrom, Tomas
    Ul-Abdin, Zain
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 1719 - 1726
  • [25] High-performance and energy-efficient heterogeneous subword parallel instructions
    Kim, J
    Wills, DS
    SIPS 2003: IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2003, : 75 - 80
  • [26] Energy-Efficient Design Methodologies: High-Performance VLSI Adders
    Zeydel, Bart R.
    Baran, Dursun
    Oklobdzija, Vojin G.
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2010, 45 (06) : 1220 - 1233
  • [27] High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics
    Choudhury, Dwaipayan
    Rajam, Aravind Sukumaran
    Kalyanaraman, Ananth
    Pande, Partha Pratim
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (01)
  • [28] Morph-GCNX: A Universal Architecture for High-Performance and Energy-Efficient Graph Convolutional Network Acceleration
    Wang, Ke
    Zheng, Hao
    Li, Jiajun
    Louri, Ahmed
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (02): : 115 - 127
  • [29] Energy-efficient, high-performance and memory efficient FIR adaptive filter architecture of wireless sensor networks for IoT applications
    Kumar, J. Charles Rajesh
    Kumar, D. Vinod
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2022, 47 (04):
  • [30] Energy-efficient, high-performance and memory efficient FIR adaptive filter architecture of wireless sensor networks for IoT applications
    J Charles Rajesh Kumar
    D Vinod Kumar
    Sādhanā, 47