Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks

Seungjae Han, Kang G. Shin

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable `real-time' communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes - neighbor detection and end-to-end detection - for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.

Original languageEnglish
Pages (from-to)613-626
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume10
Issue number6
DOIs
Publication statusPublished - 1999 Jan 1

Fingerprint

Telecommunication networks
Communication
Computer networks
Network protocols
Experiments

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

@article{fe5ea518c2b14625aaaa44750e7d0ca7,
title = "Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks",
abstract = "Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable `real-time' communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes - neighbor detection and end-to-end detection - for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.",
author = "Seungjae Han and Shin, {Kang G.}",
year = "1999",
month = "1",
day = "1",
doi = "10.1109/71.774910",
language = "English",
volume = "10",
pages = "613--626",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "6",

}

Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks. / Han, Seungjae; Shin, Kang G.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 6, 01.01.1999, p. 613-626.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Experimental evaluation of behavior-based failure-detection schemes in real-time communication networks

AU - Han, Seungjae

AU - Shin, Kang G.

PY - 1999/1/1

Y1 - 1999/1/1

N2 - Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable `real-time' communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes - neighbor detection and end-to-end detection - for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.

AB - Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable `real-time' communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes - neighbor detection and end-to-end detection - for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.

UR - http://www.scopus.com/inward/record.url?scp=0032624089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032624089&partnerID=8YFLogxK

U2 - 10.1109/71.774910

DO - 10.1109/71.774910

M3 - Article

AN - SCOPUS:0032624089

VL - 10

SP - 613

EP - 626

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 6

ER -