A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

被引:5
作者
Azarkhish, Erfan [1 ]
Loi, Igor [1 ]
Benini, Luca [1 ]
机构
[1] Univ Bologna, DEI, Bologna, Italy
关键词
DESIGN SPACE; NETWORKS; ROUTER;
D O I
10.1049/iet-cdt.2013.0031
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Shared tightly coupled data memories are key architectural elements for building multi-core clusters in programmable accelerators and embedded systems, as they provide a convenient shared memory abstraction while avoiding cache coherence overheads. The performance of these memories largely depends on the architecture of the interconnect used between processing elements (PEs) and memory banks. The advent of three-dimensional (3D) technology has provided new opportunities to increase design modularity and reduce latency and manufacturing cost. In this study, the authors propose two 3D network architectures: C-logarithmic interconnect (LIN) and Distributed logarithmic interconnect (D-LIN) (designed in synthesisable RTL), which allow modular stacking of multiple L1 memory dies over a multi-core cluster with a limited number of PEs. The authors have used two through-silicon-via technologies: the state-of-the-art micro-bumps and the promising and dense Cu-Cu direct bonding. The overhead of electrostatic discharge protection circuits has been considered, as well. Architectural simulation results demonstrate that, in processor-to-L1-memory context, C-LIN and D-LIN perform significantly better than traditional network-on-chips and simple time-division multiplexing buses. Furthermore, post-layout results show that the proposed 3D architectures achieve comparable speed against their 2D counterparts, whereas enabling modularity: from 256 kB to 2 MB L1 memory configurations with a single mask set.
引用
收藏
页码:191 / 199
页数:9
相关论文
共 33 条
  • [1] [Anonymous], 2009, NEXT GEN CUD ARCH CO
  • [2] [Anonymous], 2011, 2011 DESIGN AUTOMATI, DOI DOI 10.1109/DATE.2011.5763085
  • [3] [Anonymous], INT VLSI MULT INT C
  • [4] [Anonymous], HYP ARCH
  • [5] Balfour J., 2006, ICS '06: Proceedings of the 20th annual international conference on Supercomputing, P187, DOI DOI 10.1145/1183401.1183430
  • [6] Banakar R, 2002, CODES 2002: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN, P73, DOI 10.1109/CODES.2002.1003604
  • [7] Beanato G, 2012, IEEE INT CONF VLSI, P30, DOI 10.1109/VLSI-SoC.2012.6379001
  • [8] MPARM: Exploring the multi-processor SoC design space with SystemC
    Benini, L
    Bertozzi, D
    Bogliolo, A
    Menichelli, F
    Olivieri, M
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2005, 41 (02): : 169 - 182
  • [9] Borkar S., 2007, S LOW POW EL DES ISL
  • [10] Dae Hyun Kim, 2012, 2012 IEEE International Solid-State Circuits Conference (ISSCC), P188, DOI 10.1109/ISSCC.2012.6176969