Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale Sorting

被引:4
作者
Zokaee, Farzaneh [1 ]
Chen, Fan [1 ]
Sun, Guangyu [2 ]
Jiang, Lei [3 ]
机构
[1] Indiana Univ, Dept Intelligent Syst Engn, Bloomington, IN 47405 USA
[2] Peking Univ, Ctr Energy Efficient Comp & Applicat CECA, Beijing 100871, Peoples R China
[3] Indiana Univ, Intelligent Syst Engn, Dept Intelligent Syst Engn, Bloomington, IN USA
关键词
Sorting; Micromagnetics; Hardware; Corporate acquisitions; Bandwidth; Throughput; System-on-chip; Processing-in-memory; large-scale sorting; SKYRMION; LOGIC;
D O I
10.1109/TC.2022.3169434
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sorting is one of the most important algorithms in computer science. Conventional CPUs, GPUs, FPGAs, and ASICs running sorting are fundamentally bottlenecked by the off-chip memory bandwidth, because of their von-Neumann architecture. Processing-near-memory (PNM) designs integrate a CPU, a GPU or an ASIC upon an HBM for sorting, but their sorting throughput are still limited by the HBM bandwidth and capacity. In this paper, we propose a skyrmion racetrack memory (SRM)-based PIM accelerator, Sky-Sorter, to enhance the sorting performance of large-scale datasets. Sky-Sorter implements samplesort which involves four steps, sampling, splitting marker sorting, partitioning, and bucket sorting. An SRM-based random number generator (TRNG) is used in Sky-Sorter to randomly sample records from the dataset. Sky-Sorter divides the large dataset into many buckets based on sampled splitting markers by our proposed SRM-based partitioner. Its partitioning throughput matches the off-chip memory bandwidth. We further designed an SRM-based sorting unit (SU) to sort all records of a bucket without introducing extra CMOS logic. Our SU uses the fast in-cell insertion characteristics of SRMs to implement and perform insertsort within a bucket. Sky-Sorter employs SUs to sort all buckets simultaneously by fully utilizing large internal array bandwidth. Compared to state-of-the-art accelerators, Sky-Sorter improves the throughput per Watt by similar to 4 x .
引用
收藏
页码:480 / 493
页数:14
相关论文
共 45 条
[11]   Shift-Limited Sort: Optimizing Sorting Performance on Skyrmion Memory-Based Systems [J].
Hsieh, Yun-Shan ;
Huang, Po-Chun ;
Chen, Ping-Xiang ;
Chang, Yuan-Hao ;
Kang, Wang ;
Yang, Ming-Chang ;
Shih, Wei-Kuan .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) :4115-4128
[12]  
Hyunjun Jang, 2012, 2012 Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), P193, DOI 10.1109/NOCS.2012.30
[13]   Dry etching strategy of spin-transfer-torque magnetic random access memory: A review [J].
Islam, Rabiul ;
Cui, Bo ;
Miao, Guo-Xing .
JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 2020, 38 (05)
[14]  
Jahanshahi A., 2021, PROC 12 INT GREEN SU, P1
[15]   Terabyte Sort on FPGA-Accelerated Flash Storage [J].
Jun, Sang-Woo ;
Xu, Shuotao ;
Arvind .
2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, :17-24
[16]   A Study on Practically Unlimited Endurance of STT-MRAM [J].
Kan, Jimmy J. ;
Park, Chando ;
Ching, Chi ;
Ahn, Jaesoo ;
Xie, Yuan ;
Pakala, Mahendra ;
Kang, Seung H. .
IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (09) :3639-3646
[17]   Compact Modeling and Evaluation of Magnetic Skyrmion-Based Racetrack Memory [J].
Kang, Wang ;
Zheng, Chentian ;
Huang, Yangqi ;
Zhang, Xichao ;
Lv, Weifeng ;
Zhou, Yan ;
Zhao, Weisheng .
IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (03) :1060-1068
[18]   Voltage Controlled Magnetic Skyrmion Motion for Racetrack Memory [J].
Kang, Wang ;
Huang, Yangqi ;
Zheng, Chentian ;
Lv, Weifeng ;
Lei, Na ;
Zhang, Youguang ;
Zhang, Xichao ;
Zhou, Yan ;
Zhao, Weisheng .
SCIENTIFIC REPORTS, 2016, 6
[19]  
Li SC, 2016, DES AUT CON, DOI [10.1145/2897937.2898064, 10.1109/ICAUMS.2016.8479697]
[20]  
Li Z., 2020, PROC GREAT LAKES S V, P45