Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale Sorting

被引：4

作者：

Zokaee, Farzaneh ^{[1
]}

Chen, Fan ^{[1
]}

Sun, Guangyu ^{[2
]}

Jiang, Lei ^{[3
]}

机构：

[1] Indiana Univ, Dept Intelligent Syst Engn, Bloomington, IN 47405 USA

[2] Peking Univ, Ctr Energy Efficient Comp & Applicat CECA, Beijing 100871, Peoples R China

[3] Indiana Univ, Intelligent Syst Engn, Dept Intelligent Syst Engn, Bloomington, IN USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 02期

关键词：

Sorting; Micromagnetics; Hardware; Corporate acquisitions; Bandwidth; Throughput; System-on-chip; Processing-in-memory; large-scale sorting; SKYRMION; LOGIC;

D O I：

10.1109/TC.2022.3169434

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sorting is one of the most important algorithms in computer science. Conventional CPUs, GPUs, FPGAs, and ASICs running sorting are fundamentally bottlenecked by the off-chip memory bandwidth, because of their von-Neumann architecture. Processing-near-memory (PNM) designs integrate a CPU, a GPU or an ASIC upon an HBM for sorting, but their sorting throughput are still limited by the HBM bandwidth and capacity. In this paper, we propose a skyrmion racetrack memory (SRM)-based PIM accelerator, Sky-Sorter, to enhance the sorting performance of large-scale datasets. Sky-Sorter implements samplesort which involves four steps, sampling, splitting marker sorting, partitioning, and bucket sorting. An SRM-based random number generator (TRNG) is used in Sky-Sorter to randomly sample records from the dataset. Sky-Sorter divides the large dataset into many buckets based on sampled splitting markers by our proposed SRM-based partitioner. Its partitioning throughput matches the off-chip memory bandwidth. We further designed an SRM-based sorting unit (SU) to sort all records of a bucket without introducing extra CMOS logic. Our SU uses the fast in-cell insertion characteristics of SRMs to implement and perform insertsort within a bucket. Sky-Sorter employs SUs to sort all buckets simultaneously by fully utilizing large internal array bandwidth. Compared to state-of-the-art accelerators, Sky-Sorter improves the throughput per Watt by similar to 4 x .

引用

页码：480 / 493

页数：14

共 45 条

[11] Shift-Limited Sort: Optimizing Sorting Performance on Skyrmion Memory-Based Systems [J].

Hsieh, Yun-Shan ;

Huang, Po-Chun ;

Chen, Ping-Xiang ;

Chang, Yuan-Hao ;

Kang, Wang ;

Yang, Ming-Chang ;

Shih, Wei-Kuan .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) :4115-4128

[12]

Hyunjun Jang, 2012, 2012 Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), P193, DOI 10.1109/NOCS.2012.30

[13] Dry etching strategy of spin-transfer-torque magnetic random access memory: A review [J].

Islam, Rabiul ;

Cui, Bo ;

Miao, Guo-Xing .

JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 2020, 38 (05)

[14]

Jahanshahi A., 2021, PROC 12 INT GREEN SU, P1

[15] Terabyte Sort on FPGA-Accelerated Flash Storage [J].

Jun, Sang-Woo ;

Xu, Shuotao ;

Arvind .

2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, :17-24

[16] A Study on Practically Unlimited Endurance of STT-MRAM [J].

Kan, Jimmy J. ;

Park, Chando ;

Ching, Chi ;

Ahn, Jaesoo ;

Xie, Yuan ;

Pakala, Mahendra ;

Kang, Seung H. .

IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (09) :3639-3646

[17] Compact Modeling and Evaluation of Magnetic Skyrmion-Based Racetrack Memory [J].

Kang, Wang ;

Zheng, Chentian ;

Huang, Yangqi ;

Zhang, Xichao ;

Lv, Weifeng ;

Zhou, Yan ;

Zhao, Weisheng .

IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (03) :1060-1068

[18] Voltage Controlled Magnetic Skyrmion Motion for Racetrack Memory [J].

Kang, Wang ;

Huang, Yangqi ;

Zheng, Chentian ;

Lv, Weifeng ;

Lei, Na ;

Zhang, Youguang ;

Zhang, Xichao ;

Zhou, Yan ;

Zhao, Weisheng .

SCIENTIFIC REPORTS, 2016, 6

[19]

Li SC, 2016, DES AUT CON, DOI [10.1145/2897937.2898064, 10.1109/ICAUMS.2016.8479697]

[20]

Li Z., 2020, PROC GREAT LAKES S V, P45

← 1 2 3 4 5 →