Fast parallel stream compaction for IA-based multi/many-core processors

被引：1

作者：

Sun, Qiao ^{[1
,2
]}

Yang, Chao ^{[1
,3
]}

Wu, Changmao ^{[1
]}

Li, Leisheng ^{[1
]}

Liu, Fangfang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China

来源：

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2016年

关键词：

parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;

D O I：

10.1109/CCGrid.2016.112

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.

引用

页码：736 / 745

页数：10

共 50 条

[31] PARALLEL SPN ON MULTI-CORE CPUS AND MANY-CORE GPUS
Kirschenmann, W.
Plagne, L.
Poncot, A.
Vialle, S.
TRANSPORT THEORY AND STATISTICAL PHYSICS, 2010, 39 (2-4): : 255 - 281
[32] Regional cache organization for NoC based many-core processors
Ye, John M.
Cao, Man
Qu, Zening
Chen, Tianzhou
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (02) : 175 - 186
[33] A Multi-Core CPU and Many-Core GPU Based Fast Parallel Shuffled Complex Evolution Global Optimization Approach
Kan, Guangyuan
Lei, Tianjie
Liang, Ke
Li, Jiren
Ding, Liuqian
He, Xiaoyan
Yu, Haijun
Zhang, Dawei
Zuo, Depeng
Bao, Zhenxin
Amo-Boateng, Mark
Hu, Youbing
Zhang, Mengjie
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (02) : 332 - 344
[34] Federated Scheduling in Clustered Many-core Processors
Koike, Ryotaro
Azumi, Takuya
PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
[35] Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors
Jin, Shichao
Kim, Okhee
Feng, Wenya
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V, 2013, 7975 : 166 - 180
[36] Accelerating metric space similarity joins with multi-core and many-core processors
Jin, Shichao
Kim, Okhee
Feng, Wenya
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7971 : 166 - 180
[37] ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors
Hou, Kaixi
Wang, Hao
Feng, Wu-chun
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 383 - 392
[38] Instruction Fusion for Multiscalar and Many-Core Processors
Lu, Yaojie
Ziavras, Sotirios G.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 67 - 78
[39] Instruction Fusion for Multiscalar and Many-Core Processors
Yaojie Lu
Sotirios G. Ziavras
International Journal of Parallel Programming, 2017, 45 : 67 - 78
[40] Efficient Fault Simulation on Many-Core Processors
Kochte, Michael A.
Schaal, Marcel
Wunderlich, Hans-Joachim
Zoellin, Christian G.
PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385

← 1 2 3 4 5 →