Fault detection using hints from the socket layer

被引：3

作者：

Neves, N

Fuchs, WK

机构：

来源：

SIXTEENTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 1997年

关键词：

D O I：

10.1109/RELDIS.1997.632799

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes a fault detection mechanism that uses the error codes returned by the stream sockets to locate process failure. Since these errors are generated automatically when there is communication with a failed process, the mechanism does not incur in any failure-free overheads. However, for some types of faults, detection can only be attained if the surviving processes use certain communication operations. To asses the coverage and latency of the proposed mechanism, faults were injected during the execution of parallel applications. Our results show that in most cases, faults could be found using only the errors from the socket layer. Depending on the type of fault that was injected, detection occurred in an interval ranging from a few milliseconds to less than 9 minutes.

引用

页码：64 / 71

页数：8