Exploring fault-tolerant erasure codes for scalable all-flash array clusters

Sungjoon Koh, Jie Zhang, Miryeong Kwon, Jungyeon Yoon, David Donofrio, Nam Sung Kim, Myoungsoo Jung

Research output: Contribution to journalArticle

Abstract

Large-scale systems with all-flash arrays have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding incurs a significantly lower storage overhead than replication. To understand the impact of using erasure coding on the system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster that consists of approximately 100 processor cores with more than 50 high-performance solid-state drives (SSDs), and evaluate the cluster with a popular open-source distributed parallel file system, called Ceph. Specifically, we analyze the behaviors of a system adopting erasure coding from the following five viewpoints, and compare with those of another system using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes, and (5) impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, used by Google file systems, and compare them with triple replication employed by a typical parallel file system as a default fault tolerance mechanism. Lastly, we collect 96 block-level traces from the cluster and release them to the public domain for the use of other researchers.

Original languageEnglish
Article number8565994
Pages (from-to)1312-1330
Number of pages19
JournalIEEE Transactions on Parallel and Distributed Systems
Volume30
Issue number6
DOIs
Publication statusPublished - 2019 Jun 1

Fingerprint

Reed-Solomon codes
Fault tolerance
Program processors
Amplification
Large scale systems

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Koh, S., Zhang, J., Kwon, M., Yoon, J., Donofrio, D., Kim, N. S., & Jung, M. (2019). Exploring fault-tolerant erasure codes for scalable all-flash array clusters. IEEE Transactions on Parallel and Distributed Systems, 30(6), 1312-1330. [8565994]. https://doi.org/10.1109/TPDS.2018.2884722
Koh, Sungjoon ; Zhang, Jie ; Kwon, Miryeong ; Yoon, Jungyeon ; Donofrio, David ; Kim, Nam Sung ; Jung, Myoungsoo. / Exploring fault-tolerant erasure codes for scalable all-flash array clusters. In: IEEE Transactions on Parallel and Distributed Systems. 2019 ; Vol. 30, No. 6. pp. 1312-1330.
@article{551f52be783c4b309547cb9e1f571d49,
title = "Exploring fault-tolerant erasure codes for scalable all-flash array clusters",
abstract = "Large-scale systems with all-flash arrays have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding incurs a significantly lower storage overhead than replication. To understand the impact of using erasure coding on the system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster that consists of approximately 100 processor cores with more than 50 high-performance solid-state drives (SSDs), and evaluate the cluster with a popular open-source distributed parallel file system, called Ceph. Specifically, we analyze the behaviors of a system adopting erasure coding from the following five viewpoints, and compare with those of another system using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes, and (5) impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, used by Google file systems, and compare them with triple replication employed by a typical parallel file system as a default fault tolerance mechanism. Lastly, we collect 96 block-level traces from the cluster and release them to the public domain for the use of other researchers.",
author = "Sungjoon Koh and Jie Zhang and Miryeong Kwon and Jungyeon Yoon and David Donofrio and Kim, {Nam Sung} and Myoungsoo Jung",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/TPDS.2018.2884722",
language = "English",
volume = "30",
pages = "1312--1330",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "6",

}

Koh, S, Zhang, J, Kwon, M, Yoon, J, Donofrio, D, Kim, NS & Jung, M 2019, 'Exploring fault-tolerant erasure codes for scalable all-flash array clusters', IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 6, 8565994, pp. 1312-1330. https://doi.org/10.1109/TPDS.2018.2884722

Exploring fault-tolerant erasure codes for scalable all-flash array clusters. / Koh, Sungjoon; Zhang, Jie; Kwon, Miryeong; Yoon, Jungyeon; Donofrio, David; Kim, Nam Sung; Jung, Myoungsoo.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 30, No. 6, 8565994, 01.06.2019, p. 1312-1330.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploring fault-tolerant erasure codes for scalable all-flash array clusters

AU - Koh, Sungjoon

AU - Zhang, Jie

AU - Kwon, Miryeong

AU - Yoon, Jungyeon

AU - Donofrio, David

AU - Kim, Nam Sung

AU - Jung, Myoungsoo

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Large-scale systems with all-flash arrays have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding incurs a significantly lower storage overhead than replication. To understand the impact of using erasure coding on the system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster that consists of approximately 100 processor cores with more than 50 high-performance solid-state drives (SSDs), and evaluate the cluster with a popular open-source distributed parallel file system, called Ceph. Specifically, we analyze the behaviors of a system adopting erasure coding from the following five viewpoints, and compare with those of another system using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes, and (5) impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, used by Google file systems, and compare them with triple replication employed by a typical parallel file system as a default fault tolerance mechanism. Lastly, we collect 96 block-level traces from the cluster and release them to the public domain for the use of other researchers.

AB - Large-scale systems with all-flash arrays have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding incurs a significantly lower storage overhead than replication. To understand the impact of using erasure coding on the system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster that consists of approximately 100 processor cores with more than 50 high-performance solid-state drives (SSDs), and evaluate the cluster with a popular open-source distributed parallel file system, called Ceph. Specifically, we analyze the behaviors of a system adopting erasure coding from the following five viewpoints, and compare with those of another system using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes, and (5) impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, used by Google file systems, and compare them with triple replication employed by a typical parallel file system as a default fault tolerance mechanism. Lastly, we collect 96 block-level traces from the cluster and release them to the public domain for the use of other researchers.

UR - http://www.scopus.com/inward/record.url?scp=85058118778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058118778&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2018.2884722

DO - 10.1109/TPDS.2018.2884722

M3 - Article

VL - 30

SP - 1312

EP - 1330

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 6

M1 - 8565994

ER -