Abstract
An effective failure-detection scheme is essential for reliable communication services. Most computer network rely on behavior-based detection schemes: each node uses heartbeats to detect the failure of its neighbor nodes, and the transport protocol (like TCP) achieves reliable communication by acknowledgment/retransmission. In this paper, we experimentally evaluate the effectiveness of such behavior-based detection schemes in real-time communication. Specifically, we measure and analyze the coverage and latency of two failure-detection schemes-neighbor detection and end-to-end detection-through fault-injection experiments. The experimental results have shown that a significant portion of failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.
Original language | English |
---|---|
Title of host publication | Digest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 122-131 |
Number of pages | 10 |
ISBN (Electronic) | 0818678313, 9780818678318 |
DOIs | |
Publication status | Published - 1997 |
Event | 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 - Seattle, United States Duration: 1997 Jun 24 → 1997 Jun 27 |
Publication series
Name | Digest of Papers - 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
---|
Other
Other | 27th Annual International Symposium on Fault-Tolerant Computing, FTCS 1997 |
---|---|
Country | United States |
City | Seattle |
Period | 97/6/24 → 97/6/27 |
Bibliographical note
Funding Information:The work reported in this paper was supported in part by the Advanced Research Projects Agency, monitored by the US Airforce Rome Laboratory under Grant F30602-95-1-0044, the National Science Foundation under Grant MIP-9203895 and the Office of Naval Research under Grant N00014-94-1-0229. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Funding Information:
*The work reported in this paper was supported in part by the Advanced Research Projects Agency, monitored by the US Airforce Rome Laboratory under Grant F30602-95-1-0044, the National Science Foundation under Grant MIP-9203895 and the Office of Naval Research under Grant N00014-94-1-0229. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Hardware and Architecture
- Software
- Safety, Risk, Reliability and Quality