Fast parallel stream compaction for IA-based multi/many-core processors

被引：1

作者：

Sun, Qiao ^{[1
,2
]}

Yang, Chao ^{[1
,3
]}

Wu, Changmao ^{[1
]}

Li, Leisheng ^{[1
]}

Liu, Fangfang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China

来源：

2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2016年

关键词：

parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;

D O I：

10.1109/CCGrid.2016.112

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.

引用

页码：736 / 745

页数：10

共 50 条

[41] Avionics Platform Design Optimization Considering Multi-/Many-core Processors
Rockschies, Marius
Thielecke, Frank
2023 IEEE/AIAA 42ND DIGITAL AVIONICS SYSTEMS CONFERENCE, DASC, 2023,
[42] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
Mitsuda, Tatsuya
Ono, Kenji
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
[43] Partition-Based Hardware Transactional Memory for Many-Core Processors
Liu, Yi
Zhang, Xinwei
Wang, Yonghui
Qian, Depei
Chen, Yali
Wu, Jin
NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 308 - 321
[44] Latency Analysis of Network-On-Chip based Many-Core Processors
Kumar, Sunil
Lipari, Giuseppe
2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 432 - 439
[45] Scalable Optimal Greedy Scheduler for Asymmetric Multi-/Many-Core Processors
Venkataramani, Vanchinathan
Pathania, Anuj
Mitra, Tulika
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 127 - 141
[46] System Level Synthesis of Many-Core Architectures using Parallel Stream Rewriting
PROCEEDINGS OF THE 2014 ELECTRONIC SYSTEM LEVEL SYNTHESIS CONFERENCE (ESLSYN), 2014,
[47] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
Datta, Amitava
Kaur, Amardeep
Lauer, Tobias
Chabbouh, Sami
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
[48] Parallel Many-Core Avionics Systems
Panic, Milos
Quinones, Eduardo
Zaykov, Pavel G.
Hernandez, Carles
Abella, Jaume
Cazorla, Francisco J.
2014 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT), 2014,
[49] Many-core processors and GPU opportunities in Particle Detectors
Neufeld, Niko
Vilasis-Cardona, Xavier
2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
[50] Threaded Dynamic Memory Management in Many-Core Processors
Herrmann, Edward C.
Wilsey, Philip A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 931 - 936

← 1 2 3 4 5 →