Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks

Seungjae Han, Kang G. Shin

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable `real-time' communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes - neighbor detection and end-to-end detection - for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.

Original languageEnglish
Pages (from-to)613-626
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume10
Issue number6
DOIs
Publication statusPublished - 1999

Bibliographical note

Funding Information:
We would like to thank Harold Rosenberg and Jaehyun Park for their contribution in the implementation of DOCTOR. Atri Indiresan provided the skeleton of the real-time channel protocol used in our experiment. The work reported in this paper was supported in part by the U.S. National Science Foundation under Grant MIP-9203895, the U.S. Office of Naval Research under Grant N00014-94-1-0229, and Mitsubishi Electric Research Laboratory, Cambridge, Massachusetts. A subset of the materials of this paper was presented at the IEEE International Symposium on Fault-Tolerant Computing (FTCS-27), June 1997, in Seattle, Washington.

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks'. Together they form a unique fingerprint.

Cite this