Fast parallel stream compaction for IA-based multi/many-core processors

被引：1

作者：

Sun, Qiao ^{[1
,2
]}

Yang, Chao ^{[1
,3
]}

Wu, Changmao ^{[1
]}

Li, Leisheng ^{[1
]}

Liu, Fangfang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China

来源：

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2016年

关键词：

parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;

D O I：

10.1109/CCGrid.2016.112

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.

引用

页码：736 / 745

页数：10

共 50 条

[1] Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors
Park, Jongsoo
Smelyanskiy, Mikhail
Vaidyanathan, Karthikeyan
Heinecke, Alexander
Kalamkar, Dhiraj D.
Patwary, Md Mosotofa Ali
Pirogov, Vadim
Dubey, Pradeep
Liu, Xing
Rosales, Carlos
Mazauric, Cyril
Daley, Christopher
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01): : 11 - 27
[2] Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures
Smelyanskiy, Mikhail
Sewall, Jason
Kalamkar, Dhiraj D.
Satish, Nadathur
Dubey, Pradeep
Astafiev, Nikita
Burylov, Ilya
Nikolaev, Andrey
Maidanov, Sergey
Li, Shuo
Kulkarni, Sunil
Finan, Charles H.
Gonina, Ekaterina
2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1154 - 1162
[3] Parallel space saving on multi- and many-core processors
Cafaro, Massimo
Pulimeno, Marco
Epicoco, Italo
Aloisio, Giovanni
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
[4] Fast Data Delivery for Many-Core Processors
Bakhshalipour, Mohammad
Lotfi-Kamran, Pejman
Mazloumi, Abbas
Samandi, Farid
Naderan-Tahan, Mahmood
Modarressi, Mehdi
Sarbazi-Azad, Hamid
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) : 1416 - 1429
[5] Emerging Applications for Multi/Many-Core Processors
Lee, Victor W.
Chen, Yen-Kuang
Debuy, Pradeep
2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1524 - 1527
[6] Fast parallel beam propagation method based on multi-core and many-core architectures
Shaaban, Adel
Sayed, M.
Hameed, Mohamed Farhat O.
Saleh, Hassan, I
Gomaa, L. R.
Du, Yi-Chun
Obayya, S. S. A.
OPTIK, 2019, 180 : 484 - 491
[7] Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors
Mirsoleimani, S. Ali
Plaat, Aske
van den Herik, Jaap
Vermaseren, Jos
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 77 - 83
[8] A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic
Wang, Xinheng
Xu, Chuan
Jin, Wenqiang
Wang, Jiajie
Wang, Qianyun
Zhao, Guofeng
APPLIED SCIENCES-BASEL, 2017, 7 (02):
[9] Parallel Path Delay Fault Simulation for Multi/Many-Core Processors with SIMD Units
Ali, Yussuf
Yamato, Yuta
Yoneda, Tomokazu
Hatayama, Kazumi
Inoue, Michiko
2014 IEEE 23RD ASIAN TEST SYMPOSIUM (ATS), 2014, : 292 - 297
[10] Reducing the burden of parallel loop schedulers for many-core processors
Arif, Mahwish
Vandierendonck, Hans
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):

← 1 2 3 4 5 →