A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [21] High-throughput energy-efficient pipeline architecture for successive cancellation polar decoder
    Hematkhah, Hooman
    Kavian, Yousef Seifi
    Namjoo, Ehsan
    MICROPROCESSORS AND MICROSYSTEMS, 2022, 92
  • [22] OPT-GCN: A Unified and Scalable Chiplet-Based Accelerator for High-Performance and Energy-Efficient GCN Computation
    Zhao, Yingnan
    Wang, Ke
    Louri, Ahmed
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (12) : 4827 - 4840
  • [23] High-performance and energy-efficient 64-bit incrementer/decrementer using Multiple-Output Monotonic CMOS
    Balobas, Dimitrios
    Konofaos, Nikos
    INTEGRATION-THE VLSI JOURNAL, 2018, 62 : 270 - 281
  • [24] DMTJ-Based Non-Volatile Ternary Content Addressable Memory for Energy-Efficient High-Performance Systems
    Vicuna, Kevin
    Procel, Luis-Miguel
    Trojman, Lionel
    Taco, Ramiro
    2022 IEEE 13TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS), 2022, : 33 - 36
  • [25] LU factorization on heterogeneous systems: an energy-efficient approach towards high performance
    Cheng Chen
    Jianbin Fang
    Tao Tang
    Canqun Yang
    Computing, 2017, 99 : 791 - 811
  • [26] LU factorization on heterogeneous systems: an energy-efficient approach towards high performance
    Chen, Cheng
    Fang, Jianbin
    Tang, Tao
    Yang, Canqun
    COMPUTING, 2017, 99 (08) : 791 - 811
  • [27] DirectPath: High Performance and Energy Efficient Platform I/O Architecture for Content Intensive Usages
    Wang, Ren
    Maciocco, Christian
    Tai, Tsung-Yuan Charlie
    Yavatkar, Raj
    Lu, Lucas Kecheng
    Min, Alexander W.
    2012 THIRD INTERNATIONAL CONFERENCE ON FUTURE ENERGY SYSTEMS: WHERE ENERGY, COMPUTING AND COMMUNICATION MEET (E-ENERGY), 2012,
  • [28] A Flexible and Energy-Efficient Reconfigurable Architecture for Symmetric Cipher Processing
    Wang, Bo
    Liu, Leibo
    2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 1182 - 1185
  • [29] A Reconfigurable Spatial Architecture for Energy-Efficient Inception Neural Networks
    Luo, Lichuan
    Kang, Wang
    Liu, Junzhan
    Zhang, He
    Zhang, Youguang
    Liu, Dijun
    Ouyang, Peng
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2023, 13 (01) : 7 - 20
  • [30] Energy-efficient Dynamic Deployment Architecture for Future Cellular Systems
    Alsedairy, Talal
    Imran, Muhammad
    Qi, Yinan
    Evans, Barry
    2013 IEEE 24TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2013, : 3111 - 3116