FIFO-based Hardware Sorters for High Bandwidth Memory

被引：0

作者：

Nakano, Koji ^{[1
]}

Ito, Yasuaki ^{[1
]}

Bordim, Jacir L. ^{[2
]}

机构：

[1] Hiroshima Univ, Dept Informat Engn, Kagamiyama 1-4-1, Higashihiroshima 7398527, Japan

[2] Univ Brasilia, Dept Comp Sci, BR-70910900 Brasilia, DF, Brazil

来源：

2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) | 2019年

关键词：

parallel sorting algorithms; hardware sorter; high bandwidth memory; burst memory access; big data analysis;

D O I：

10.1109/IPDPSW.2019.00112

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The main contribution of this paper is to show efficient FIFO-based hardware sorters that sort n elements with w bits each stored in a high bandwidth memory with modest access latency. We assume that each address of the high bandwidth memory can store p elements of w bits each, which can be read or written at the same time. The access latency l of the high bandwidth memory is assumed to take l clock cycles to access p elements in a specified address. Furthermore, burst mode is supported and k (>= 1) consecutive addresses can be accessed in k + l - 1 clock cycles in a pipeline fashion. However, if k addresses are not consecutive, kl clock cycles are necessary to access all of them. Clearly, all n elements arranged n/p addresses can be duplicated in 2(n/p + l - 1) clock cycles. We present two types of hardware sorters that sort n = rc elements stored in an r x c matrix of the high bandwidth memory. We first develop Three-Pass-Sort and Four-Pass-Sort that sort an r x c matrix by reading from and witting in it three times and four times, respectively. We implement these two algorithms using FIFO-based mergers that can be configured as pairwise mode and sliding mode. Our hardware sorter based on Three-Pass-Sort runs in 6n/p + 3c(2)/p(2)l + O(c/p (l + log r) + r) clock cycles using a circuit of size O(rwp) provided that r >= c(2). Also, our hardware sorter based on Four-Pass-Sort runs in 8n/p + 2c(2)l + O(cl + log r + p) clock cycles using a circuit of size O(rw).

引用

页码：663 / 672

页数：10

共 21 条

[1] Ajtai Miklos, 1983, P 15 ANN ACM S THEOR, P1
[2] AKL S.G., 1990, PARALLEL SORTING ALG
[3] [Anonymous], 2014, 7 SER FPGAS MEM RES
[4] [Anonymous], 2014, 7 SER FPGAS CONF LOG
[5] Batcher K. E., 1968, P AFIPS SPRING JOINT, V32, P307
[6] Batcher KE., 1968, Proceedings of the Spring Joint Computer Conference, P307, DOI DOI 10.1145/1468075.1468121
[7] PARALLEL MERGE SORT
COLE, R
[J]. SIAM JOURNAL ON COMPUTING, 1988, 17 (04) : 770 - 785
[8] Cormen T. H., 1990, Introduction to Algorithms
[9] Gibbons A., 1988, Efficient Parallel Algorithms
[10] Harada N, 2016, INT SYMPOS COMPUT NE, P565, DOI [10.1109/CANDAR.2016.42, 10.1109/CANDAR.2016.0103]

← 1 2 3 →