MigSpike: A Migration Based Algorithms and Architecture for Scalable Robust Neuromorphic Systems

被引:10
作者
Dang, Khanh N. [1 ,2 ]
Nguyen Anh Vu Doan [3 ]
Ben Abdallah, Abderazek [2 ]
机构
[1] Vietnam Natl Univ, VNU Key Lab Smart Integrated Syst SISLAB, VNU Univ Engn & Technol, Hanoi 123106, Vietnam
[2] Univ Aizu, Grad Sch Comp Sci & Engn, Adapt Syst Lab, Aizu Wakamatsu, Fukushima 9658580, Japan
[3] Tech Univ Munich, Integrated Syst, D-80333 Munich, Germany
关键词
Fault-tolerance; spiking neural network; neuromorphic system; network-on-chip; max flow; migration; SPIKING NEURAL-NETWORKS; REPAIR;
D O I
10.1109/TETC.2021.3136028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While conventional hardware neuromorphic systems usually consist of multiple clusters of neurons that communicate via an interconnect infrastructure, scaling up them confronts the reliability issue when faults in the neuron circuits and synaptic weight memories can cause faulty outputs. This work presents a method named MigSpike that allows placing spare neurons for repairing with the support of enhanced migrating methods and the built-in hardware architecture for migrating neurons between nodes (clusters of neurons). MigSpike architecture supports migrating the unmapped neurons from their nodes to suitable ones within the system by creating chains of migrations. Furthermore, a max-flow min-cut adaptation and a genetic algorithm approach are presented to solve the aforementioned problem. The evaluation results show that the proposed methods support recovery up to 100% of spare neurons. While the max-flow min-cut adaption can execute milliseconds, the genetic algorithm can help reduce the migration cost with a graceful degradation on communication cost. With a system of 256 neurons per node and a 20% fault rate, our approach minimizes the migration cost from remapping by 10.19X and 96.13X under Networks-on-Chip of 4 X 4 (smallest) and 16 X 16 X 16 (largest), respectively. The Mean-Time-to-Failure evaluation also shows an approximate 10X of lifetime expectancy by having a 20% spare rate.
引用
收藏
页码:602 / 617
页数:16
相关论文
共 41 条
[1]   True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip [J].
Akopyan, Filipp ;
Sawada, Jun ;
Cassidy, Andrew ;
Alvarez-Icaza, Rodrigo ;
Arthur, John ;
Merolla, Paul ;
Imam, Nabil ;
Nakamura, Yutaka ;
Datta, Pallab ;
Nam, Gi-Joon ;
Taba, Brian ;
Beakes, Michael ;
Brezzo, Bernard ;
Kuang, Jente B. ;
Manohar, Rajit ;
Risk, William P. ;
Jackson, Bryan ;
Modha, Dharmendra S. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) :1537-1557
[2]   Mapping Spiking Neural Networks to Neuromorphic Hardware [J].
Balaji, Adarsha ;
Das, Anup ;
Wu, Yuefeng ;
Huynh, Khanh ;
Dell'Anna, Francesco G. ;
Indiveri, Giacomo ;
Krichmar, Jeffrey L. ;
Dutt, Nikil D. ;
Schaafsma, Siebren ;
Catthoor, Francky .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) :76-86
[3]   Adaptive fault-tolerant architecture and routing algorithm for reliable many-core 3D-NoC systems [J].
Ben Ahmed, Akram ;
Ben Abdallah, Abderazek .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 :30-43
[4]   Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations [J].
Benjamin, Ben Varkey ;
Gao, Peiran ;
McQuinn, Emmett ;
Choudhary, Swadesh ;
Chandrasekaran, Anand R. ;
Bussat, Jean-Marie ;
Alvarez-Icaza, Rodrigo ;
Arthur, John V. ;
Merolla, Paul A. ;
Boahen, Kwabena .
PROCEEDINGS OF THE IEEE, 2014, 102 (05) :699-716
[5]   Fault-Tolerant Network-on-Chip Design with Flexible Spare Core Placement [J].
Bhanu, P. Veda ;
Kulkarni, Pranav Venkatesh ;
Soumya, J. .
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (01)
[6]   BulletProof: A defect-tolerant CMP switch architecture [J].
Constantinides, Kypros ;
Plaza, Stephen ;
Blome, Jason ;
Zhang, Bin ;
Bertacco, Valeria ;
Mahlke, Scott ;
Austin, Todd ;
Orshansky, Michael .
TWELFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2006, :3-+
[7]  
Dang Khanh N., 2019, 2019 International Conference on Internet of Things, Embedded Systems and Communications (IINTEC). Proceedings, P155, DOI 10.1109/IINTEC48298.2019.9112123
[8]   Scalable Design Methodology and Online Algorithm for TSV-Cluster Defects Recovery in Highly Reliable 3D-NoC Systems [J].
Dang, Khanh N. ;
Ben Ahmed, Akram ;
Okuyama, Yuichi ;
Ben Abdallah, Abderazek .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2020, 8 (03) :577-590
[9]   Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems [J].
Dang, Khanh N. ;
Meyer, Michael ;
Okuyama, Yuichi ;
Ben Abdallah, Abderazek .
2016 IEEE 25TH ASIAN TEST SYMPOSIUM (ATS), 2016, :161-166
[10]   Loihi: A Neuromorphic Manycore Processor with On-Chip Learning [J].
Davies, Mike ;
Srinivasa, Narayan ;
Lin, Tsung-Han ;
Chinya, Gautham ;
Cao, Yongqiang ;
Choday, Sri Harsha ;
Dimou, Georgios ;
Joshi, Prasad ;
Imam, Nabil ;
Jain, Shweta ;
Liao, Yuyun ;
Lin, Chit-Kwan ;
Lines, Andrew ;
Liu, Ruokun ;
Mathaikutty, Deepak ;
Mccoy, Steve ;
Paul, Arnab ;
Tse, Jonathan ;
Venkataramanan, Guruguhanathan ;
Weng, Yi-Hsin ;
Wild, Andreas ;
Yang, Yoonseok ;
Wang, Hong .
IEEE MICRO, 2018, 38 (01) :82-99