Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming

被引:1
作者
Deng, Can [1 ,2 ]
Chen, Zhaoyun [1 ,2 ]
Shi, Yang [1 ,2 ]
Ma, Yimin [1 ,2 ]
Wen, Mei [1 ,2 ]
Luo, Lei [1 ,3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Deya Rd, Changsha 410000, Hunan, Peoples R China
[2] Key Lab Adv Microprocessor Chips & Syst, Changsha 410000, Peoples R China
[3] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Deya Rd, Changsha 410073, Hunan, Peoples R China
关键词
Embedded system; instruction scheduling; quantitative analysis; dynamic programming; VLIW; OPTIMIZATION;
D O I
10.1145/3643135
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This article proposes a twodimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared with four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared with the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.
引用
收藏
页数:20
相关论文
共 39 条
  • [1] Bogdan Paul, 2019, P INT C HARDW SOFTW, DOI [10.1145/3349567.3357376, DOI 10.1145/3349567.3357376]
  • [2] A novel enhanced whale optimization algorithm for global optimization
    Chakraborty, Sanjoy
    Saha, Apu Kumar
    Sharma, Sushmita
    Mirjalili, Seyedali
    Chakraborty, Ratul
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 153
  • [3] Chen SM, 2014, IEEE MICRO, V34, P63
  • [4] Deng C, 2022, ASIA S PACIF DES AUT, P256, DOI 10.1109/ASP-DAC52403.2022.9712500
  • [5] Learning Pareto-Frontier Resource Management Policies for Heterogeneous SoCs: An Information-Theoretic Approach
    Deshwal, Aryan
    Belakaria, Syrine
    Bhat, Ganapati
    Doppa, Janardhan Rao
    Pande, Partha Pratim
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 607 - 612
  • [6] Efficient Instruction Scheduling Using Real-time Load Delay Tracking
    Diavastos, Andreas
    Carlson, Trevor E.
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2022, 40 (1-4):
  • [7] Faraboschi P, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P203, DOI [10.1145/342001.339682, 10.1109/ISCA.2000.854391]
  • [8] Figielska E., 2014, Zeszyty Naukowe Warszawskiej Wyzszej Szkoly Informatyki, P29, DOI [10.26348/znwwsi.11.29, DOI 10.26348/ZNWWSI.11.29]
  • [9] Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers
    Giesemann, Florian
    Gerlach, Lukas
    Paya-Vaya, Guillermo
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (07): : 655 - 678
  • [10] Giesemann F, 2017, INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), P179, DOI 10.1109/SAMOS.2017.8344626