Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor

被引:0
作者
Hao Cao
Shaozhong Guo
Jiangwei Hao
Yuanyuan Xia
Jinchen Xu
机构
[1] State Key Laboratory of Mathematical Engineering and Advanced Computing,
来源
The Journal of Supercomputing | 2022年 / 78卷
关键词
Assembly; Performance optimization; Superblock scheduling; SW26010;
D O I
暂无
中图分类号
学科分类号
摘要
The SW26010 many-core processor is based on the Sunway architecture that is composed of management and computing processing elements (MPE and CPE, respectively), each of which is equipped with a stand-alone math library. The issue is that each Sunway Math Library (SML) version is written in assembly which is outside the power of compilers that take high-level languages as input; existing optimization approaches thus mainly rely on manual strategies, which are considered inefficient. In this paper, we leverage the concept of superblock scheduling, a well-known compilation technique, and present a tool named SMPOT to optimize the SML. SMPOT first builds a superblock using a novel tail duplication algorithm, and then uses code motion restrictions to avoid code compensation, followed by matching the machine model. Finally, it reorders instructions on the main path by an activation algorithm based on available computing resources. The experimental results show that SMPOT can effectively improve the performance of the SML. The main path performance of MPE functions is improved by 10.61% on average and overall performance by 5.40%. The main path performance of CPE functions is improved by 5.72% on average and overall performance by 2.98%.
引用
收藏
页码:4827 / 4849
页数:22
相关论文
共 31 条
[1]  
Shobaki G(2005)Optimal superblock scheduling using enumeration Proc Ann Intl Symp Microarch 59 1-16
[2]  
Wilken K(2016)The Sunway TaihuLight supercomputer: system and applications Sci China Inf Sci 19 48-57
[3]  
Fu H(2017)Benchmarking SW26010 many-core processor IEEE Intl Parallel Disrt Proc Symp Wkshp (IPDPSW) 10 13312-92
[4]  
Liao J(1984)A fortran compiler for the FPS-164 scientific computer Sigplan Notices - SIGPLAN 34 85-undefined
[5]  
Yang J(1986)Efficient instruction scheduling for a pipelined architecture ACM SIGPLAN Notices doi 26 1306-undefined
[6]  
Xu Z(1990)Instruction scheduling for the IBM RISC System/6000 IBM J Res Dev undefined undefined-undefined
[7]  
Lin J(1981)Trace scheduling: a technique for global microcode compaction IEEE Transact Comput undefined undefined-undefined
[8]  
Matsuoka S(1991)Using profile information to assist classic compiler code optimizations Softw Pract Exper undefined undefined-undefined
[9]  
Touzeau RF(2015)Testing platform for floating mathematical function libraries J Softw undefined undefined-undefined
[10]  
Gibbons PB(1994)Avoidance and suppression of compensation code in a trace scheduling compiler ACM Trans Prog Lang Sys undefined undefined-undefined