LION Real-Time I/O Transfer Control for Massively Parallel Processor Arrays

被引:2
|
作者
Walter, Dominik [1 ]
Teich, Juergen [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Hardware Software Codesign, Erlangen, Germany
来源
2021 19TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE) | 2022年
关键词
Massively Parallel Processor Arrays; TCPA; I/O Scheduling; Data Transfers; Priority QueueZ; PRIORITY QUEUE MANAGEMENT; ARCHITECTURE;
D O I
10.1145/3487212.3487349
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of many accelerator architectures depends on the communication with external memory. During execution, new I/O data is continuously fetched forth and back to memory. This data exchange is very often performance-critical and a careful orchestration thus vital. To satisfy the I/O demand for accelerators of loop nests, it was shown that the individual reads and writes can be merged into larger blocks, which are subsequently transferred by a single DMA transfer. Furthermore, the order in which such DMA transfers must be issued, was shown to be reducible to a real-time task scheduling problem to be solved at run time. Rather than just concepts, we investigate in this paper efficient algorithms, data structures and their implementation in hardware of such a programmable Loop I/O Controller architecture called LION that only needs to be synthesized once for each processor array size and I/O buffer configuration, thus supporting a large class of processor arrays. Based on a proposed heap-based priority queue, LION is able to issue every 6 cycles a new DMA request to a memory bus. Even on a simple FPGA prototype running at just 200 MHz, this allows for more than 33 million DMA requests to be issued per second. Since the execution time of a typical DMA request is in general at least one order of magnitude longer, we can conclude that this rate is sufficient to fully utilize a given memory interface. Finally, we present implementations on FPGA and also 22nm FDX ASIC showing that the overall overhead of a LION typically amounts to less than 5% of an overall processor array design.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [1] Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays
    Walter, Dominik
    Witterauf, Michael
    Teich, Juergen
    2020 18TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE), 2020, : 104 - 114
  • [2] A Scalable Massively Parallel Processor for Real-Time Image Processing
    Kurafuji, Takashi
    Haraguchi, Masaru
    Nakajima, Masami
    Nishijima, Tetsu
    Tanizaki, Tetsushi
    Yamasaki, Hiroyuki
    Sugimura, Takeaki
    Imai, Yuta
    Ishizaki, Masakatsu
    Kumaki, Takeshi
    Murata, Kan
    Yoshida, Kanako
    Shimomura, Eisuke
    Noda, Hideyuki
    Okuno, Yoshihiro
    Kamijo, Shunsuke
    Koide, Tetsushi
    Mattausch, Hans Juergen
    Arimoto, Kazutami
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2011, 46 (10) : 2363 - 2373
  • [4] A system-on-a-programmable-chip for real-time control of massively parallel arrays of biosensors and actuators
    Romani, A
    Campi, F
    Ronconi, S
    Tartagni, M
    Medoro, G
    Manaresi, N
    3RD IEEE INTERNATIONAL WORKSHOP ON SYSTEM-ON-CHIP FOR REAL-TIME APPLICATIONS, PROCEEDINGS, 2003, : 236 - 241
  • [5] Responsive Processor for parallel/distributed real-time control
    Yamasaki, N
    IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4: EXPANDING THE SOCIETAL ROLE OF ROBOTICS IN THE NEXT MILLENNIUM, 2001, : 1238 - 1244
  • [6] Scaling OpenSHMEM for Massively Parallel Processor Arrays
    Ross, James A.
    Richie, David A.
    OPENSHMEM AND RELATED TECHNOLOGIES: OPENSHMEM IN THE ERA OF EXTREME HETEROGENEITY, OPENSHMEM 2018, 2019, 11283 : 137 - 147
  • [7] Design of processor arrays for real-time applications
    Fimmel, D
    Merker, R
    EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 1018 - 1028
  • [8] A parallel neural processor for real-time applications
    Danese, G
    Leporati, F
    Ramat, S
    IEEE MICRO, 2002, 22 (03) : 20 - 31
  • [9] Design space exploration for massively parallel processor arrays
    Hannig, F
    Teich, J
    PARALLEL COMPUTING TECHNOLOGIES, 2001, 2127 : 51 - 65
  • [10] Determination of an optimal processor allocation in the design of massively parallel processor arrays
    Fimmel, D
    Merker, R
    ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 309 - 322