Reliable cluster computing with a new checkpointing RAID-x architecture

Kai Hwang, Hai Jin, Roy Ho, Wonwoo Ro

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

In a serverless cluster of PCs or workstations, the cluster must allow remote file accesses or parallel I/O directly performed over disks distributed to all client nodes. We introduce a new distributed disk array, called the RAID-x, for use in serverless clusters. The RAID-x architecture is based on an orthogonal striping and mirroring (OSM) scheme, which exploits full-bandwidth and protects the system from all single disk failures. The performance of the RAID-x is experimentally proven superior to RAID-1 and NFS in the Linux cluster environment. We propose a new striped checkpointing scheme, leveraging on striped parallelism and pipelined writing of successive disk stripes. This RAID-x architecture greatly enhances the throughput, reliability, and availability of scalable clusters. It appeals especially to I/O-centric cluster applications.

Original languageEnglish
Pages (from-to)171-184
Number of pages14
JournalProceedings of the Heterogeneous Computing Workshop, HCW
Publication statusPublished - 2000

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Reliable cluster computing with a new checkpointing RAID-x architecture'. Together they form a unique fingerprint.

Cite this