A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [31] Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration
    Yang, Jiaqi
    Zheng, Hao
    Louri, Ahmed
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 349 - 361
  • [32] Zen: An Energy-Efficient High-Performance x86 Core
    Singh, Teja
    Schaefer, Alex
    Rangarajan, Sundar
    John, Deepesh
    Henrion, Carson
    Schreiber, Russell
    Rodriguez, Miguel
    Kosonocky, Stephen
    Naffziger, Samuel
    Novak, Amy
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 102 - 114
  • [33] Cooperative Partitioning: Energy-Efficient Cache Partitioning for High-Performance CMPs
    Sundararajan, Karthik T.
    Porpodas, Vasileios
    Jones, Timothy M.
    Topham, Nigel P.
    Franke, Bjoern
    2012 IEEE 18TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2012, : 311 - 322
  • [34] Hybrid Nonvolatile Disk Cache for Energy-Efficient and High-Performance Systems
    Shi, Liang
    Li, Jianhua
    Xue, Chun Jason
    Zhou, Xuehai
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2013, 18 (01)
  • [35] Ameba: A High-performance and Energy-efficient Online Video Retrieval System
    Yang, Jin
    Pang, Jianmin
    Yu, Jintao
    Cao, Wei
    2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 200 - 203
  • [36] Nanowire FET With Corner Spacer for High-Performance, Energy-Efficient Applications
    Sachid, Angada B.
    Lin, Hsiang-Yun
    Hu, Chenming
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (12) : 5181 - 5187
  • [37] High-Performance and Scalable Organosilicon Membranes for Energy-Efficient Alcohol Purification
    Zhu, Tengyang
    Shen, Dongchen
    Dong, Jiayu
    Liu, Huan
    Xia, Qing
    Li, Song
    Shao, Lu
    Wang, Yan
    ADVANCED FUNCTIONAL MATERIALS, 2025, 35 (07)
  • [38] TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing
    Zhou, Jinhong
    Liu, Shaoli
    Guo, Qi
    Zhou, Xuda
    Zhi, Tian
    Liu, Daofu
    Wang, Chao
    Zhou, Xuehai
    Chen, Yunji
    Chen, Tianshi
    2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 731 - 734
  • [39] High-performance energy-efficient D-flip-flop circuits
    Ko, UM
    Balsara, PT
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (01) : 94 - 98
  • [40] High-Performance Energy-Efficient NoC Fabrics: Evolution and Future Challenges
    Anders, Mark A.
    2014 EIGHTH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS), 2014, : I - I