Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software

被引:27
作者
Rahimi, Abbas [1 ]
Benini, Luca [2 ,3 ]
Gupta, Rajesh K. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Swiss Fed Inst Technol Zurich, Dept Informat Technol & Elect Engn, CH-8092 Zurich, Switzerland
[3] Univ Bologna, Dept Elect Elect & Informat Engn, I-40136 Bologna, Italy
基金
美国国家科学基金会;
关键词
Approximate computing; resilient systems; timing errors; variability; LOW-POWER; DYNAMIC VOLTAGE; ENERGY-EFFICIENT; ERROR-CORRECTION; TIMING ERRORS; ARCHITECTURE; DESIGN; LOGIC; PERFORMANCE; MICROCONTROLLER;
D O I
10.1109/JPROC.2016.2518864
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Variation in performance and power across manufactured parts and their operating conditions is an accepted reality in modern microelectronic manufacturing processes with geometries in nanometer scales. This article surveys challenges and opportunities in identifying variations, their effects and methods to combat these variations for improved microelectronic devices. We focus on computing devices and their design at various levels to combat variability. First, we provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. We consider methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; we also consider their implications on cost, performance and quality. We provide a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross-layer or hybrid resilience. We then provide examples of real world resilient single-core and parallel architectures. We find that parallel architectures and parallelism in general provide the best means to combat and exploit variability to design resilient and efficient systems. Using programmable accelerator architectures such as clustered processing elements and GP-GPUs, we show how system designers can coordinate propagation of timing error information and its effects along with new techniques for memoization (i.e., spatial or temporal reuse of computation). This discussion naturally leads to use of these techniques into emerging area of "approximate computing," and how these can be used in building resilient and efficient computing systems. We conclude with an outlook for the emerging field.
引用
收藏
页码:1410 / 1448
页数:39
相关论文
共 173 条
[1]   A process-tolerant cache architecture for improved yield in nanoscale technologies [J].
Agarwal, A ;
Paul, BC ;
Mahmoodi, H ;
Datta, A ;
Roy, K .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2005, 13 (01) :27-38
[2]  
Aguilera P, 2014, DES AUT TEST EUROPE
[3]   Performance evaluation of checksum-based ABFT [J].
Al-Yamani, AA ;
Oh, N ;
McCluskey, EJ .
2001 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2001, :461-466
[4]   Architecture and Robust Control of a Digital Frequency-Locked Loop for Fine-Grain Dynamic Voltage and Frequency Scaling in Globally Asynchronous Locally Synchronous Structures [J].
Albea, Carolina ;
Puschini, Diego ;
Vivet, Pascal ;
Miro-Panades, Ivan ;
Beigne, Edith ;
Lesecq, Suzanne .
JOURNAL OF LOW POWER ELECTRONICS, 2011, 7 (03) :328-340
[5]   Fuzzy memoization for floating-point multimedia applications [J].
Alvarez, C ;
Corbal, J ;
Valero, M .
IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (07) :922-927
[6]   Dynamic Tolerance Region Computing for Multimedia [J].
Alvarez Martinez, Carlos ;
Corbal San Adrian, Jesus ;
Valero Cortes, Mateo .
IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (05) :650-665
[7]  
[Anonymous], 2010, CUST INT CIRC C CICC
[8]  
[Anonymous], 2011, 2011 DESIGN AUTOMATI, DOI DOI 10.1109/DATE.2011.5763085
[9]  
[Anonymous], 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
[10]  
[Anonymous], P IEEE INT C IC DES