Fast parallel stream compaction for IA-based multi/many-core processors

被引:1
|
作者
Sun, Qiao [1 ,2 ]
Yang, Chao [1 ,3 ]
Wu, Changmao [1 ]
Li, Leisheng [1 ]
Liu, Fangfang [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China
关键词
parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;
D O I
10.1109/CCGrid.2016.112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.
引用
收藏
页码:736 / 745
页数:10
相关论文
共 50 条
  • [31] PARALLEL SPN ON MULTI-CORE CPUS AND MANY-CORE GPUS
    Kirschenmann, W.
    Plagne, L.
    Poncot, A.
    Vialle, S.
    TRANSPORT THEORY AND STATISTICAL PHYSICS, 2010, 39 (2-4): : 255 - 281
  • [32] Regional cache organization for NoC based many-core processors
    Ye, John M.
    Cao, Man
    Qu, Zening
    Chen, Tianzhou
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (02) : 175 - 186
  • [33] A Multi-Core CPU and Many-Core GPU Based Fast Parallel Shuffled Complex Evolution Global Optimization Approach
    Kan, Guangyuan
    Lei, Tianjie
    Liang, Ke
    Li, Jiren
    Ding, Liuqian
    He, Xiaoyan
    Yu, Haijun
    Zhang, Dawei
    Zuo, Depeng
    Bao, Zhenxin
    Amo-Boateng, Mark
    Hu, Youbing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (02) : 332 - 344
  • [34] Federated Scheduling in Clustered Many-core Processors
    Koike, Ryotaro
    Azumi, Takuya
    PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
  • [35] Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors
    Jin, Shichao
    Kim, Okhee
    Feng, Wenya
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V, 2013, 7975 : 166 - 180
  • [36] Accelerating metric space similarity joins with multi-core and many-core processors
    Jin, Shichao
    Kim, Okhee
    Feng, Wenya
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7971 : 166 - 180
  • [37] ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors
    Hou, Kaixi
    Wang, Hao
    Feng, Wu-chun
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 383 - 392
  • [38] Instruction Fusion for Multiscalar and Many-Core Processors
    Lu, Yaojie
    Ziavras, Sotirios G.
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 67 - 78
  • [39] Instruction Fusion for Multiscalar and Many-Core Processors
    Yaojie Lu
    Sotirios G. Ziavras
    International Journal of Parallel Programming, 2017, 45 : 67 - 78
  • [40] Efficient Fault Simulation on Many-Core Processors
    Kochte, Michael A.
    Schaal, Marcel
    Wunderlich, Hans-Joachim
    Zoellin, Christian G.
    PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385