Fast parallel stream compaction for IA-based multi/many-core processors

被引:1
|
作者
Sun, Qiao [1 ,2 ]
Yang, Chao [1 ,3 ]
Wu, Changmao [1 ]
Li, Leisheng [1 ]
Liu, Fangfang [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, State Key Lab Comp Sci, Beijing 100190, Peoples R China
关键词
parallel stream compaction; automatic code generation; automatic code optimization; Xeon Phi;
D O I
10.1109/CCGrid.2016.112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stream compaction, frequently found in a large variety of applications, serves as a general primitive that reduces an input stream to a subset containing only the wanted elements so that the follow-on computation can be done efficiently. In this paper, we propose a fast parallel stream compaction for IA-based multi-/many-core processors. Unlike the previously studied algorithms that depend heavily on a black-box parallel scan, we open the black- box in the proposed algorithm and manually tailor it so that both the workload and the memory footprint is significantly reduced. By further eliminating the conditional statements and applying automatic code generation/ optimization for performance-critical kernels, the proposed parallel stream compaction achieves high performance in different cases and for various data types across different IA-based multi/manycore platforms. Experimental results on three typical IA-based processors, including a quad-core Core-i7 CPU, a dual-socket 8-core Xeon CPU, and a 61-core Xeon Phi accelerator show that the proposed implementation outperforms the referenced parallel counterpart in the state-of-art library Thrust. On top of the above, we apply it in the random forest based data classifier to show its potential to boost the performance of real-world applications.
引用
收藏
页码:736 / 745
页数:10
相关论文
共 50 条
  • [1] Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors
    Park, Jongsoo
    Smelyanskiy, Mikhail
    Vaidyanathan, Karthikeyan
    Heinecke, Alexander
    Kalamkar, Dhiraj D.
    Patwary, Md Mosotofa Ali
    Pirogov, Vadim
    Dubey, Pradeep
    Liu, Xing
    Rosales, Carlos
    Mazauric, Cyril
    Daley, Christopher
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01): : 11 - 27
  • [2] Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures
    Smelyanskiy, Mikhail
    Sewall, Jason
    Kalamkar, Dhiraj D.
    Satish, Nadathur
    Dubey, Pradeep
    Astafiev, Nikita
    Burylov, Ilya
    Nikolaev, Andrey
    Maidanov, Sergey
    Li, Shuo
    Kulkarni, Sunil
    Finan, Charles H.
    Gonina, Ekaterina
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1154 - 1162
  • [3] Parallel space saving on multi- and many-core processors
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    Aloisio, Giovanni
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
  • [4] Fast Data Delivery for Many-Core Processors
    Bakhshalipour, Mohammad
    Lotfi-Kamran, Pejman
    Mazloumi, Abbas
    Samandi, Farid
    Naderan-Tahan, Mahmood
    Modarressi, Mehdi
    Sarbazi-Azad, Hamid
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) : 1416 - 1429
  • [5] Emerging Applications for Multi/Many-Core Processors
    Lee, Victor W.
    Chen, Yen-Kuang
    Debuy, Pradeep
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1524 - 1527
  • [6] Fast parallel beam propagation method based on multi-core and many-core architectures
    Shaaban, Adel
    Sayed, M.
    Hameed, Mohamed Farhat O.
    Saleh, Hassan, I
    Gomaa, L. R.
    Du, Yi-Chun
    Obayya, S. S. A.
    OPTIK, 2019, 180 : 484 - 491
  • [7] Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors
    Mirsoleimani, S. Ali
    Plaat, Aske
    van den Herik, Jaap
    Vermaseren, Jos
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 77 - 83
  • [8] A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic
    Wang, Xinheng
    Xu, Chuan
    Jin, Wenqiang
    Wang, Jiajie
    Wang, Qianyun
    Zhao, Guofeng
    APPLIED SCIENCES-BASEL, 2017, 7 (02):
  • [9] Parallel Path Delay Fault Simulation for Multi/Many-Core Processors with SIMD Units
    Ali, Yussuf
    Yamato, Yuta
    Yoneda, Tomokazu
    Hatayama, Kazumi
    Inoue, Michiko
    2014 IEEE 23RD ASIAN TEST SYMPOSIUM (ATS), 2014, : 292 - 297
  • [10] Reducing the burden of parallel loop schedulers for many-core processors
    Arif, Mahwish
    Vandierendonck, Hans
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):