Fast parallel stream compaction for IA-based multi/many-core processors

被引:1
|
作者
Sun, Qiao [1 ,2 ]
Yang, Chao [1 ,3 ]
Wu, Changmao [1 ]
Li, Leisheng [1 ]
Liu, Fangfang [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China
关键词
parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;
D O I
10.1109/CCGrid.2016.112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.
引用
收藏
页码:736 / 745
页数:10
相关论文
共 50 条
  • [41] Avionics Platform Design Optimization Considering Multi-/Many-core Processors
    Rockschies, Marius
    Thielecke, Frank
    2023 IEEE/AIAA 42ND DIGITAL AVIONICS SYSTEMS CONFERENCE, DASC, 2023,
  • [42] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
    Mitsuda, Tatsuya
    Ono, Kenji
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
  • [43] Partition-Based Hardware Transactional Memory for Many-Core Processors
    Liu, Yi
    Zhang, Xinwei
    Wang, Yonghui
    Qian, Depei
    Chen, Yali
    Wu, Jin
    NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 308 - 321
  • [44] Latency Analysis of Network-On-Chip based Many-Core Processors
    Kumar, Sunil
    Lipari, Giuseppe
    2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 432 - 439
  • [45] Scalable Optimal Greedy Scheduler for Asymmetric Multi-/Many-Core Processors
    Venkataramani, Vanchinathan
    Pathania, Anuj
    Mitra, Tulika
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 127 - 141
  • [46] System Level Synthesis of Many-Core Architectures using Parallel Stream Rewriting
    PROCEEDINGS OF THE 2014 ELECTRONIC SYSTEM LEVEL SYNTHESIS CONFERENCE (ESLSYN), 2014,
  • [47] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
    Datta, Amitava
    Kaur, Amardeep
    Lauer, Tobias
    Chabbouh, Sami
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
  • [48] Parallel Many-Core Avionics Systems
    Panic, Milos
    Quinones, Eduardo
    Zaykov, Pavel G.
    Hernandez, Carles
    Abella, Jaume
    Cazorla, Francisco J.
    2014 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT), 2014,
  • [49] Many-core processors and GPU opportunities in Particle Detectors
    Neufeld, Niko
    Vilasis-Cardona, Xavier
    2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
  • [50] Threaded Dynamic Memory Management in Many-Core Processors
    Herrmann, Edward C.
    Wilsey, Philip A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 931 - 936