A synthesis methodology for hybrid custom instruction and coprocessor generation for extensible processors

被引:14
作者
Sun, Fei
Ravi, Srivaths
Raghunathan, Arland
Jha, Niraj K.
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] NEC Labs Amer Inc, Princeton, NJ 08540 USA
基金
美国国家科学基金会;
关键词
Index Terms-Application-specific instruction set processor; coprocessor; custom instruction; extensible processor;
D O I
10.1109/TCAD.2007.906457
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Systems-on-chip often use hardware accelerators or coprocessors to provide efficient implementations of application-specific functions. The recent emergence of extensible processor cores with supporting design tools has given designers with another viable alternative, namely, the use of application-specific custom instructions. Coprocessors and custom instructions can be viewed as two different forms of hardware acceleration that are applicable at different levels of granularity and offer differing tradeoffs. Classical hardware/software-partitioning techniques and application-specific instruction-set design tools address the individual problems of coprocessor generation and custom-instruction addition. However, given a complex applications it is not clear which design choice (coprocessors or custom instructions or a combination) will result in better performance, area, or power consumption. We demonstrate that a combination of custom instructions and coprocessors is often the best solution in many applications, making the case for a hybrid custom-instruction and coprocessor-synthesis methodology. We propose such a methodology that builds upon the basic observations that coprocessors are usually good for coarse-grained tasks and require minimal intervention or support from the processor, while custom instructions are usually suited to fine-grained operations that are best integrated into a processor pipeline. Our methodology uses a hierarchical task-graph representation in order to support both coarse- and fine-grained views of an application, which are necessary to make meaningful tradeoffs. We propose a hierarchical synthesis algorithm that incorporates multiobjective evolutionary optimization in order to handle different design dimensions, such as area and performance, and provide a wide range of nondominated solutions. We have implemented the proposed methodology in the context of a commercial extensible processor-based platform (Xtensa from Tensilica). Our design flow uses a commercial behavioral-synthesis tool and an existing automatic-custominstruction-generation tool. Our experiments with several applications show that simultaneous custom-instruction and coprocessor synthesis can achieve significantly better area/performance tradeoffs than using only one of them.
引用
收藏
页码:2035 / 2045
页数:11
相关论文
共 37 条
  • [11] CLARK N, 2002, P 1 WORKSH APPL SPEC, P94
  • [12] CLARK N, 2003, P INT S MICR DEC, P40
  • [13] An updated survey of GA-based multiobjective optimization techniques
    Coello, CAC
    [J]. ACM COMPUTING SURVEYS, 2000, 32 (02) : 109 - 143
  • [14] A fast and elitist multiobjective genetic algorithm: NSGA-II
    Deb, K
    Pratap, A
    Agarwal, S
    Meyarivan, T
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) : 182 - 197
  • [15] DEMICHELI G, 2001, READINGS HARDWARE SO
  • [16] Dick RP, 1998, HARDW SOFTW CODES, P97, DOI 10.1109/HSC.1998.666245
  • [17] MOGAC: A multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems
    Dick, RP
    Jha, NK
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1998, 17 (10) : 920 - 935
  • [18] *FORT DES SYST, CYNTH
  • [19] AUTOMATIC EXTRACTION OF FUNCTIONAL PARALLELISM FROM ORDINARY PROGRAMS
    GIRKAR, M
    POLYCHRONOPOULOS, CD
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1992, 3 (02) : 166 - 178
  • [20] Goodwin D., 2003, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, P137, DOI 10.1145/951710.951730