Reconciling fault-tolerant distributed computing and systems-on-chip

被引:22
|
作者
Fuegger, Matthias [1 ]
Schmid, Ulrich [1 ]
机构
[1] Tech Univ Wien, Embedded Comp Syst Grp E182 2, A-1040 Vienna, Austria
基金
奥地利科学基金会;
关键词
Clock synchronization; Fault-tolerant; distributed systems; Modeling approaches; VLSI; CLOCK SYNCHRONIZATION; SOFT ERRORS; DESIGN; IMPOSSIBILITY; ARCHITECTURE; CONSENSUS; CIRCUITS; ISSUES; TRENDS;
D O I
10.1007/s00446-011-0151-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Classic distributed computing abstractions do not match well the reality of digital logic gates, which are the elementary building blocks of Systems-on-Chip (SoCs) and other Very Large Scale Integrated (VLSI) circuits: Massively concurrent, continuous computations undermine the concept of sequential processes executing sequences of atomic zero-time computing steps, and very limited computational resources at gate-level make even simple operations prohibitively costly. In this paper, we introduce a modeling and analysis framework based on continuous computations and zero-bit message channels, and employ this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs). Starting out from a "classic" distributed Byzantine fault-tolerant tick generation algorithm, we show how to adapt it for direct implementation in clockless digital logic, and rigorously prove its correctness and derive analytic expressions for worst case performance metrics like synchronization precision and clock frequency. Rather than on absolute delay values, both the algorithm's correctness and the achievable synchronization precision depend solely on the ratio of certain path delays. Since these ratios can be mapped directly to placement & routing constraints, there is typically no need for changing the algorithm when migrating to a faster implementation technology and/or when using a slightly different layout in an SoC.
引用
收藏
页码:323 / 355
页数:33
相关论文
共 50 条
  • [41] Combination of clock-state and clock-rate correction in fault-tolerant distributed systems
    Kopet, H
    Astrit, AU
    Hanzlik, A
    REAL-TIME SYSTEMS, 2006, 33 (1-3) : 139 - 173
  • [42] Distributed fault-tolerant control for heterogeneous multiagent systems and application in wireless power transfer grid
    Hua, Xingxing
    Dai, Xin
    Sun, Shaoxin
    Sun, Yue
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (17): : 13509 - 13536
  • [43] Distributed fault-tolerant control for a class of cooperative uncertain systems with actuator failures and switching topologies
    Wang, Xin
    Yang, Guang-Hong
    INFORMATION SCIENCES, 2016, 370 : 650 - 666
  • [44] Fully Distributed Fault-Tolerant Consensus Protocols for Lipschitz Nonlinear Multi-Agent Systems
    Wang, Qi
    Wang, Jinzhi
    IEEE ACCESS, 2018, 6 : 17313 - 17325
  • [45] NoC-Based Fault-Tolerant Cache Design in Chip Multiprocessors
    Banaiyanmofrad, Abbas
    Girao, Gustavo
    Dutt, Nikil
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2014, 13
  • [46] Fault-tolerant routing for reliable packet transmission in on-chip networks
    Ouyang, Yiming
    Zhang, Tianbao
    Li, Jianhua
    Liang, Huaguo
    MICROELECTRONICS JOURNAL, 2024, 153
  • [47] AFTER: Asynchronous Fault-Tolerant Router Design in Network-on-Chip
    Ouyang, Yiming
    Chen, Qi
    Wang, Xiumin
    Ouyang, Xiaoye
    Liang, Huaguo
    Du, Gaoming
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2016, 25 (06)
  • [48] Design of Fault-Tolerant and Reliable Networks-on-Chip
    Wang, Junshi
    Ebrahimi, Masoumeh
    Huang, Letian
    Jantsch, Axel
    Li, Guangjun
    2015 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2015, : 545 - 550
  • [49] A Fault-Tolerant Deflection Routing for Network-on-Chip
    Zhou, Xiaofeng
    Liu, Lu
    Zhu, Zhangming
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2017, 26 (03)
  • [50] On the Design of a Fault-tolerant Photonic Network-on-Chip
    Meyer, Michael Conrad
    Ben Ahmed, Akram
    Tanaka, Yuki
    Ben Abdallah, Abderazek
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 821 - 826