Probabilistic cluster fault diagnosis for multiprocessor systems

被引:0
|
作者
Niu, Baohua [1 ]
Zhou, Shuming [1 ,2 ]
Zhang, Hong [1 ]
Zhang, Qifan [1 ]
机构
[1] Fujian Normal Univ, Coll Math & Stat, Fuzhou 350117, Fujian, Peoples R China
[2] Fujian Normal Univ, Ctr Appl Math Fujian Prov, Fuzhou 350117, Peoples R China
基金
中国国家自然科学基金;
关键词
Probabilistic diagnostic model; Cluster fault; Reliability; CONDITIONAL DIAGNOSABILITY; RELIABILITY; (N;
D O I
10.1016/j.tcs.2024.114837
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As high performance computing systems consisting of multiple processors play an important role in big data analytics, we are motivated to focus on the research of reliability, design-for- test, fault diagnosis and detection of large-scale multiprocessor interconnected systems. System- level diagnosis theory, which originates from the testing of VLSI and Wafer, aims to identify faulty processors in these systems by means of analyzing the test results among the processors, while diagnosability as well as diagnosis accuracy are two important indices. The probabilistic fault diagnostic strategy seeks to correctly diagnose processors with high probability under the assumption that each processor has a certain failing probability. In this work, based on the probabilistic diagnosis algorithm with consideration of fault clustering, we specialize in the local diagnostic capability to establish the probability that any processor in a discrete status is diagnosed correctly. Subsequently, we investigate the global performance evaluation of multiprocessor systems under various significant fault distributions including Poisson distribution, Exponential distribution and Binomial distribution. In addition, we directly apply our results to the data center network HSDC and ( n, k )-star network. Numerical simulations are performed to verify the established results, which reveal the relationship between the accuracy of correct diagnosis and regulatory parameters.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] One-step t-fault diagnosis for hypermesh optical interconnection multiprocessor systems
    Liu, Xingchang
    Yang, Xiaofan
    Xiang, Min
    JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (09) : 1491 - 1496
  • [42] Comparison diagnosis in large multiprocessor systems
    Fuhrman, CP
    Nussbaumer, HJ
    PROCEEDINGS OF THE FIFTH ASIAN TEST SYMPOSIUM (ATS '96), 1996, : 244 - 249
  • [43] A Distributed Probabilistic Model for Fault Diagnosis
    Li Ona Garcia, Ana
    Enrique Sucar, L.
    Morales, Eduardo F.
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2018, 2018, 11238 : 42 - 53
  • [44] Active Fault Diagnosis for Stochastic Nonlinear Systems: Online Probabilistic Model Discrimination
    Martin-Casas, Marc
    Mesbah, Ali
    IFAC PAPERSONLINE, 2018, 51 (18): : 702 - 707
  • [45] FFNLFD: Fault Diagnosis of Multiprocessor Systems at Local Node With Fault-Free Neighbors Under PMC Model and MM* Model
    Lin, Limei
    Huang, Yanze
    Lin, Yuhang
    Hsieh, Sun-Yuan
    Xu, Li
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (07) : 1739 - 1751
  • [46] DESIGN OF ALGORITHM-BASED FAULT-TOLERANT MULTIPROCESSOR SYSTEMS FOR CONCURRENT ERROR-DETECTION AND FAULT-DIAGNOSIS
    VINNAKOTA, B
    JHA, NK
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (10) : 1099 - 1106
  • [47] Optimal fault-tolerant computing on multiprocessor systems
    Bruno, J
    Coffman, EG
    ACTA INFORMATICA, 1997, 34 (12) : 881 - 904
  • [49] A novel fault tolerance measure of interconnection multiprocessor systems
    He, Li
    Yang, Xiaofan
    Yang, Erjie
    Zhu, Qingyi
    Journal of Information and Computational Science, 2012, 9 (09): : 2619 - 2626
  • [50] STRUCTURE PRINCIPLES FOR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS
    SCHMITTER, E
    SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1978, 7 (06): : 328 - 331