FIFO-based Hardware Sorters for High Bandwidth Memory

被引:0
作者
Nakano, Koji [1 ]
Ito, Yasuaki [1 ]
Bordim, Jacir L. [2 ]
机构
[1] Hiroshima Univ, Dept Informat Engn, Kagamiyama 1-4-1, Higashihiroshima 7398527, Japan
[2] Univ Brasilia, Dept Comp Sci, BR-70910900 Brasilia, DF, Brazil
来源
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) | 2019年
关键词
parallel sorting algorithms; hardware sorter; high bandwidth memory; burst memory access; big data analysis;
D O I
10.1109/IPDPSW.2019.00112
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The main contribution of this paper is to show efficient FIFO-based hardware sorters that sort n elements with w bits each stored in a high bandwidth memory with modest access latency. We assume that each address of the high bandwidth memory can store p elements of w bits each, which can be read or written at the same time. The access latency l of the high bandwidth memory is assumed to take l clock cycles to access p elements in a specified address. Furthermore, burst mode is supported and k (>= 1) consecutive addresses can be accessed in k + l - 1 clock cycles in a pipeline fashion. However, if k addresses are not consecutive, kl clock cycles are necessary to access all of them. Clearly, all n elements arranged n/p addresses can be duplicated in 2(n/p + l - 1) clock cycles. We present two types of hardware sorters that sort n = rc elements stored in an r x c matrix of the high bandwidth memory. We first develop Three-Pass-Sort and Four-Pass-Sort that sort an r x c matrix by reading from and witting in it three times and four times, respectively. We implement these two algorithms using FIFO-based mergers that can be configured as pairwise mode and sliding mode. Our hardware sorter based on Three-Pass-Sort runs in 6n/p + 3c(2)/p(2)l + O(c/p (l + log r) + r) clock cycles using a circuit of size O(rwp) provided that r >= c(2). Also, our hardware sorter based on Four-Pass-Sort runs in 8n/p + 2c(2)l + O(cl + log r + p) clock cycles using a circuit of size O(rw).
引用
收藏
页码:663 / 672
页数:10
相关论文
共 21 条
  • [1] Ajtai Miklos, 1983, P 15 ANN ACM S THEOR, P1
  • [2] AKL S.G., 1990, PARALLEL SORTING ALG
  • [3] [Anonymous], 2014, 7 SER FPGAS MEM RES
  • [4] [Anonymous], 2014, 7 SER FPGAS CONF LOG
  • [5] Batcher K. E., 1968, P AFIPS SPRING JOINT, V32, P307
  • [6] Batcher KE., 1968, Proceedings of the Spring Joint Computer Conference, P307, DOI DOI 10.1145/1468075.1468121
  • [7] PARALLEL MERGE SORT
    COLE, R
    [J]. SIAM JOURNAL ON COMPUTING, 1988, 17 (04) : 770 - 785
  • [8] Cormen T. H., 1990, Introduction to Algorithms
  • [9] Gibbons A., 1988, Efficient Parallel Algorithms
  • [10] Harada N, 2016, INT SYMPOS COMPUT NE, P565, DOI [10.1109/CANDAR.2016.42, 10.1109/CANDAR.2016.0103]