A High-Performance, Energy-Efficient Modular DMA Engine Architecture

被引:4
|
作者
Benz, Thomas [1 ]
Rogenmoser, Michael [1 ]
Scheffler, Paul [1 ]
Riedel, Samuel [1 ]
Ottaviano, Alessandro [1 ]
Kurth, Andreas [1 ]
Hoefler, Torsten [2 ]
Benini, Luca [3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Integrated Syst Lab IIS, CH-8092 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Scalable Parallel Comp Lab SPCL, CH-8092 Zurich, Switzerland
[3] Swiss Fed Inst Technol, Integrated Syst Lab IIS, Zurich, Switzerland
[4] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, Italy
关键词
DMA; DMAC; direct memory access; memory systems; high-performance; energy-efficiency; edge AI; AXI; TileLink;
D O I
10.1109/TC.2023.3329930
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8$\boldsymbol{\times}$x with only 1% additional area compared to a base system without a DMAE. We achieve an area reduction of 10% while improving ML inference performance by 23% in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
引用
收藏
页码:263 / 277
页数:15
相关论文
共 50 条
  • [41] Parallelization strategies for high-performance and energy-efficient epidemic spread simulations
    Cagigas-Muniz, Daniel
    Diaz-del-Rio, Fernando
    Sevillano-Ramos, Jose Luis
    Guisado-Lizar, Jose-Luis
    SIMULATION MODELLING PRACTICE AND THEORY, 2025, 140
  • [42] Thread Batching for High-performance Energy-efficient GPU Memory Design
    Li, Bing
    Mao, Mengjie
    Liu, Xiaoxiao
    Liu, Tao
    Liu, Zihao
    Wen, Wujie
    Chen, Yiran
    Li, Hai
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (04)
  • [43] PanNDE: A modular architecture for high-performance NDE simulation
    Schneck, William C., III
    Frankforter, Erik L.
    Gregory, Elizabeth D.
    SOFTWAREX, 2021, 15
  • [44] A Scalable and Modular Architecture for High-Performance Packet Classification
    Ganegedara, Thilan
    Jiang, Weirong
    Prasanna, Viktor K.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (05) : 1135 - 1144
  • [45] MODULAR ROUTER ARCHITECTURE FOR HIGH-PERFORMANCE INTERCONNECTION NETWORKS
    Borovska, Plamenka
    Kimovski, Dragi
    Hristov, Atanas
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2015, 22 (05): : 1127 - 1134
  • [46] MODULAR ARCHITECTURE FOR HIGH-PERFORMANCE IMPLEMENTATION OF THE FFT ALGORITHM
    SAPIECHA, K
    JAROCKI, R
    IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (12) : 1464 - 1468
  • [47] Fast Pipelined Storage for High-Performance Energy-Efficient Computing with Superconductor Technology
    Dorojevets, Mikhail
    Chen, Zuoting
    2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
  • [48] High-performance and energy-efficient fault-tolerance core mapping in NoC
    Beechu, Naresh Kumar Reddy
    Harishchandra, Vasantha Moodabettu
    Balachandra, Nithin Kumar Yernad
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2017, 16 : 1 - 10
  • [49] High-performance, Energy-efficient Mobile Wireless Networking in 802.11 Infrastructure Mode
    Wirtz, Hanno
    Kunz, Georg
    Laudenberg, Johannes
    Backhaus, Robert
    Wehrle, Klaus
    2014 IEEE 11TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SENSOR SYSTEMS (MASS), 2014, : 291 - 299
  • [50] High-Performance and Energy-Efficient Approximate Multiplier for Error-Tolerant Applications
    Kim, Sunghyun
    Kim, Youngmin
    PROCEEDINGS INTERNATIONAL SOC DESIGN CONFERENCE 2017 (ISOCC 2017), 2017, : 278 - 279