Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale Sorting

被引：4

作者：

Zokaee, Farzaneh ^{[1
]}

Chen, Fan ^{[1
]}

Sun, Guangyu ^{[2
]}

Jiang, Lei ^{[3
]}

机构：

[1] Indiana Univ, Dept Intelligent Syst Engn, Bloomington, IN 47405 USA

[2] Peking Univ, Ctr Energy Efficient Comp & Applicat CECA, Beijing 100871, Peoples R China

[3] Indiana Univ, Intelligent Syst Engn, Dept Intelligent Syst Engn, Bloomington, IN USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 02期

关键词：

Sorting; Micromagnetics; Hardware; Corporate acquisitions; Bandwidth; Throughput; System-on-chip; Processing-in-memory; large-scale sorting; SKYRMION; LOGIC;

D O I：

10.1109/TC.2022.3169434

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sorting is one of the most important algorithms in computer science. Conventional CPUs, GPUs, FPGAs, and ASICs running sorting are fundamentally bottlenecked by the off-chip memory bandwidth, because of their von-Neumann architecture. Processing-near-memory (PNM) designs integrate a CPU, a GPU or an ASIC upon an HBM for sorting, but their sorting throughput are still limited by the HBM bandwidth and capacity. In this paper, we propose a skyrmion racetrack memory (SRM)-based PIM accelerator, Sky-Sorter, to enhance the sorting performance of large-scale datasets. Sky-Sorter implements samplesort which involves four steps, sampling, splitting marker sorting, partitioning, and bucket sorting. An SRM-based random number generator (TRNG) is used in Sky-Sorter to randomly sample records from the dataset. Sky-Sorter divides the large dataset into many buckets based on sampled splitting markers by our proposed SRM-based partitioner. Its partitioning throughput matches the off-chip memory bandwidth. We further designed an SRM-based sorting unit (SU) to sort all records of a bucket without introducing extra CMOS logic. Our SU uses the fast in-cell insertion characteristics of SRMs to implement and perform insertsort within a bucket. Sky-Sorter employs SUs to sort all buckets simultaneously by fully utilizing large internal array bandwidth. Compared to state-of-the-art accelerators, Sky-Sorter improves the throughput per Watt by similar to 4 x .

引用

页码：480 / 493

页数：14

共 45 条

[1]

[Anonymous], 2010, International Symposium on Parallel and Distributed Processing

[2]

Chatterjee N, 2012, INT S HIGH PERF COMP, P41

[3] FPGA-Accelerated Samplesort for Large Data Sets [J].

Chen, Han ;

Madaminov, Sergey ;

Ferdman, Michael ;

Milder, Peter .

2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, :222-232

[4] A 167-ps 2.34-mW Single-Cycle 64-Bit Binary Tree Comparator With Constant-Delay Logic in 65-nm CMOS [J].

Chuang, Pierce I-Jen ;

Sachdev, Manoj ;

Gaudet, Vincent C. .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2014, 61 (01) :160-171

[5] Active interposer technology for chiplet-based advanced 3D system architectures [J].

Coudrain, Perceval ;

Charbonnier, J. ;

Garnier, A. ;

Vivet, P. ;

Velard, R. ;

Vinci, A. ;

Ponthenier, F. ;

Farcy, A. ;

Segaud, R. ;

Chausse, P. ;

Arnaud, L. ;

Lattard, D. ;

Guthmuller, E. ;

Romano, G. ;

Gueugnot, A. ;

Berger, F. ;

Beltritti, J. ;

Mourier, T. ;

Gottardi, M. ;

Minoret, S. ;

Ribiere, C. ;

Romero, G. ;

Philip, P-E ;

Exbrayat, Y. ;

Scevola, D. ;

Campos, D. ;

Argoud, M. ;

Allouti, N. ;

Eleouet, R. ;

Tortolero, C. Fuguet ;

Aumont, C. ;

Dutoit, D. ;

Legalland, C. ;

Michailos, J. ;

Cheramy, S. ;

Simon, G. .

2019 IEEE 69TH ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC), 2019, :569-578

[6] Application Exploration for 3-D Integrated Circuits: TCAM, FIFO, and FFT Case Studies [J].

Davis, W. Rhett ;

Oh, Eun Chu ;

Sule, Ambarish M. ;

Franzon, Paul D. .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (04) :496-506

[7]

DeWitt D. J., 1991, 1043 U WISC MAD

[8] Skyrmions on the track [J].

Fert, Albert ;

Cros, Vincent ;

Sampaio, Joao .

NATURE NANOTECHNOLOGY, 2013, 8 (03) :152-156

[9] Accelerators and Coherence: An SoC Perspective [J].

Giri, Davide ;

Mantovani, Paolo ;

Carloni, Luca P. .

IEEE MICRO, 2018, 38 (06) :36-45

[10]

Gray J., 1998, SORT BENCHMARK HOME

← 1 2 3 4 5 →