Toward efficient structured-grid triangular solver on sunway many-core processors

被引:0
|
作者
Jianjiang Li
Jiabi Liang
Wei Xue
Zhengding Hu
Lin Li
Jinliang Shi
机构
[1] University of Science and Technology Beijing,Department of Computer Science and Technology
[2] Tsinghua University,Department of Computer Science and Technology
[3] University of Science and Technology of China,Department of Computer Science and Technology
来源
The Journal of Supercomputing | 2024年 / 80卷
关键词
Many-core processor SW26010; Structured-grid problems; Parallel computing; Heterogeneous parallel optimization;
D O I
暂无
中图分类号
学科分类号
摘要
The sparse triangular solver (SpTRSV) is mostly used for scientific and engineering applications. The structured-grid triangular solver of regular dependencies (STRSV) is a special kind of SpTRSV. Some general SpTRSVs that disregards the regularity of the matrix are unsuitable for solving this problem. This paper proposes an efficient parallel algorithm for STRSV on the SW26010 (a kind of China independently designed many-core processors), namely swStructTRSV. The algorithm makes full use of the fine-grained and low latency communication characteristics of the SW26010 to reduce the waiting time for synchronization, maximizes the regularity of access to improve memory access bandwidth, and achieves overlap between memory access and computation simultaneously. Moreover, the idea of the algorithm can be extended to incomplete LU factorization (ILU factorization) because of consistent dependencies. The experimental results on a core group(8 * 8 network composed of 64 cores) of SW26010 show that swStructTRSV can achieve an average speedup of over 30 in the sequential version. swStructTRSV on SW26010 achieves solving speedups of 2.2 and 6.3 over the fast STRSV (fSpTRSV) previously implemented on SW26010 and MKL on Intel Xeon Gold 6132, respectively. swStructTRSV significantly outperforms cuSparse on NVIDIA TITAN RTX in terms of overall execution time.
引用
收藏
页码:10610 / 10636
页数:26
相关论文
共 13 条
  • [1] Toward efficient structured-grid triangular solver on sunway many-core processors
    Li, Jianjiang
    Liang, Jiabi
    Xue, Wei
    Hu, Zhengding
    Li, Lin
    Shi, Jinliang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (08) : 10610 - 10636
  • [2] Accelerating Lattice QCD on Sunway Many-core Processor
    Zhang Zengxiao
    Luan Zhongzhi
    Xu Chongyang
    Gong Ming
    Xu Shun
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 605 - 612
  • [3] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
    Mitsuda, Tatsuya
    Ono, Kenji
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
  • [4] SunwayImg: A Parallel Image Processing Library for the Sunway Many-Core Processor
    Liu, Rui
    Liu, Yi
    Zhao, Meiting
    Song, Kaida
    Qian, Depei
    IEEE ACCESS, 2019, 7 : 128555 - 128569
  • [5] Reducing the burden of parallel loop schedulers for many-core processors
    Arif, Mahwish
    Vandierendonck, Hans
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13)
  • [6] Fluid-film lubrication computing with many-core processors and graphics processing units
    Wang, Nenzi
    Chen, Hsin-Yi
    Chen, Yu-Wen
    ADVANCES IN MECHANICAL ENGINEERING, 2018, 10 (10)
  • [7] An efficient implementation of kernel density estimation for multi-core and many-core architectures
    Lopez-Novoa, Unai
    Saenz, Jon
    Mendiburu, Alexander
    Miguel-Alonso, Jose
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (03) : 331 - 347
  • [8] AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors
    Kamaleldin, Ahmed
    Goehringer, Diana
    IEEE ACCESS, 2022, 10 : 43895 - 43913
  • [9] Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions
    Dong-Rui Fan
    Nan Yuan
    Jun-Chao Zhang
    Yong-Bin Zhou
    Wei Lin
    Feng-Long Song
    Xiao-Chun Ye
    He Huang
    Lei Yu
    Guo-Ping Long
    Hao Zhang
    Lei Liu
    Journal of Computer Science and Technology, 2009, 24 : 1061 - 1073
  • [10] Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions
    Fan, Dong-Rui
    Yuan, Nan
    Zhang, Jun-Chao
    Zhou, Yong-Bin
    Lin, Wei
    Song, Feng-Long
    Ye, Xiao-Chun
    Huang, He
    Yu, Lei
    Long, Guo-Ping
    Zhang, Hao
    Liu, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2009, 24 (06) : 1061 - 1073