Architecture Support for Task Out-of-Order Execution in MPSoCs

被引:16
作者
Wang, Chao [1 ]
Li, Xi [1 ]
Zhang, Junneng [2 ]
Chen, Peng [1 ]
Chen, Yunji [3 ]
Zhou, Xuehai [4 ]
Cheung, Ray C. C. [5 ]
机构
[1] Univ Sci & Technol China, Dept Comp Sci, Hefei 230027, Anhui, Peoples R China
[2] Univ Sci & Technol China, Hefei 230027, Anhui, Peoples R China
[3] Chinese Acad Sci, CARCH, State Key Lab, Beijing 100190, Peoples R China
[4] Univ Sci & Technol China, Suzhou Inst, Suzhou 215123, Peoples R China
[5] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
Middleware; architecture support; MPSoC; data dependencies; FPGA; out-of-order execution;
D O I
10.1109/TC.2014.2315628
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-processor system on chip (MPSoC) has been widely applied in embedded systems in the past decades. However, it has posed great challenges to efficiently design and implement a rapid prototype for diverse applications due to heterogeneous instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, this paper proposes a novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs. The architecture support is composed of a hierarchical middleware with an automatic task level OoO parallel execution engine. Incorporated with a hierarchical OoO layer model, the middleware is able to identify the parallel regions and generate the sources codes automatically. Besides, a runtime middleware Task-Scoreboarding analyzes the inter-task data dependencies and automatically schedules and dispatches the tasks with parameter renaming techniques. The middleware has been verified by the prototype built on FPGA platform. Examples and a JPEG case study demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.
引用
收藏
页码:1296 / 1310
页数:15
相关论文
共 42 条
[1]  
Bellens P, 2009, SCI PROGRAMMING-NETH, V17, P77, DOI [10.1155/2009/561672, 10.3233/SPR-2009-0272]
[2]   A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance [J].
Berekovic, Mladen ;
Berekovic, Mladen ;
Niggemeier, Tim .
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2008, 50 (02) :201-229
[3]  
BLUMOFE RD, 1995, SIGPLAN NOTICES, V30, P207
[4]  
Board O. A. R., 1998, OPENMP C C APPL PROG
[5]   The Future of Microprocessors [J].
Borkar, Shekhar ;
Chien, Andrew A. .
COMMUNICATIONS OF THE ACM, 2011, 54 (05) :67-77
[6]  
Chao Wang, 2011, 2011 Proceedings of IEEE International Conference on Services Computing (SCC 2011), P709, DOI 10.1109/SCC.2011.26
[7]  
Deng D. Y., 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P137, DOI 10.1109/MICRO.2010.17
[8]  
Etsion Y., 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P89, DOI 10.1109/MICRO.2010.13
[9]  
Ghuloum Anwar, 2007, Intel Technology Journal, V11, P333, DOI 10.1535/itj.1104.07
[10]   Platune: A tuning framework for system-on-a-chip platforms [J].
Givargis, T ;
Vahid, F .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2002, 21 (11) :1317-1327