LION Real-Time I/O Transfer Control for Massively Parallel Processor Arrays

被引：2

作者：

Walter, Dominik ^{[1
]}

Teich, Juergen ^{[1
]}

机构：

[1] Friedrich Alexander Univ Erlangen Nurnberg, Hardware Software Codesign, Erlangen, Germany

来源：

2021 19TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE) | 2022年

关键词：

Massively Parallel Processor Arrays; TCPA; I/O Scheduling; Data Transfers; Priority QueueZ; PRIORITY QUEUE MANAGEMENT; ARCHITECTURE;

D O I：

10.1145/3487212.3487349

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The performance of many accelerator architectures depends on the communication with external memory. During execution, new I/O data is continuously fetched forth and back to memory. This data exchange is very often performance-critical and a careful orchestration thus vital. To satisfy the I/O demand for accelerators of loop nests, it was shown that the individual reads and writes can be merged into larger blocks, which are subsequently transferred by a single DMA transfer. Furthermore, the order in which such DMA transfers must be issued, was shown to be reducible to a real-time task scheduling problem to be solved at run time. Rather than just concepts, we investigate in this paper efficient algorithms, data structures and their implementation in hardware of such a programmable Loop I/O Controller architecture called LION that only needs to be synthesized once for each processor array size and I/O buffer configuration, thus supporting a large class of processor arrays. Based on a proposed heap-based priority queue, LION is able to issue every 6 cycles a new DMA request to a memory bus. Even on a simple FPGA prototype running at just 200 MHz, this allows for more than 33 million DMA requests to be issued per second. Since the execution time of a typical DMA request is in general at least one order of magnitude longer, we can conclude that this rate is sufficient to fully utilize a given memory interface. Finally, we present implementations on FPGA and also 22nm FDX ASIC showing that the overall overhead of a LION typically amounts to less than 5% of an overall processor array design.

引用

页码：32 / 43

页数：12

共 50 条

[1] Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays
Walter, Dominik
Witterauf, Michael
Teich, Juergen
2020 18TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE), 2020, : 104 - 114
[2] A Scalable Massively Parallel Processor for Real-Time Image Processing
Kurafuji, Takashi
Haraguchi, Masaru
Nakajima, Masami
Nishijima, Tetsu
Tanizaki, Tetsushi
Yamasaki, Hiroyuki
Sugimura, Takeaki
Imai, Yuta
Ishizaki, Masakatsu
Kumaki, Takeshi
Murata, Kan
Yoshida, Kanako
Shimomura, Eisuke
Noda, Hideyuki
Okuno, Yoshihiro
Kamijo, Shunsuke
Koide, Tetsushi
Mattausch, Hans Juergen
Arimoto, Kazutami
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2011, 46 (10) : 2363 - 2373
[3] VIRIM: A massively parallel processor for real-time volume visualization in medicine
Comput Graphics (Pergamon), 5 (705):
[4] A system-on-a-programmable-chip for real-time control of massively parallel arrays of biosensors and actuators
Romani, A
Campi, F
Ronconi, S
Tartagni, M
Medoro, G
Manaresi, N
3RD IEEE INTERNATIONAL WORKSHOP ON SYSTEM-ON-CHIP FOR REAL-TIME APPLICATIONS, PROCEEDINGS, 2003, : 236 - 241
[5] Responsive Processor for parallel/distributed real-time control
Yamasaki, N
IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4: EXPANDING THE SOCIETAL ROLE OF ROBOTICS IN THE NEXT MILLENNIUM, 2001, : 1238 - 1244
[6] Scaling OpenSHMEM for Massively Parallel Processor Arrays
Ross, James A.
Richie, David A.
OPENSHMEM AND RELATED TECHNOLOGIES: OPENSHMEM IN THE ERA OF EXTREME HETEROGENEITY, OPENSHMEM 2018, 2019, 11283 : 137 - 147
[7] Design of processor arrays for real-time applications
Fimmel, D
Merker, R
EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 1018 - 1028
[8] A parallel neural processor for real-time applications
Danese, G
Leporati, F
Ramat, S
IEEE MICRO, 2002, 22 (03) : 20 - 31
[9] Design space exploration for massively parallel processor arrays
Hannig, F
Teich, J
PARALLEL COMPUTING TECHNOLOGIES, 2001, 2127 : 51 - 65
[10] Determination of an optimal processor allocation in the design of massively parallel processor arrays
Fimmel, D
Merker, R
ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 309 - 322

← 1 2 3 4 5 →