Selective I/O bypass and load balancing method for write-through SSD caching in big data analytics

Jaehyung Kim, Hongchan Roh, Sang Hyun Park

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Fast network quality analysis in the telecom industry is an important method used to provide quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system consisting of a hundred nodes, each of which only contains hard disk drives (HDDs). Because the analysis process is a set of parallel I/O intensive jobs, adding solid state drives (SSDs) with appropriate settings is the most cost-efficient way to improve the performance, as shown in previous studies. Therefore, we decided to configure SSDs as a write-through cache instead of increasing the number of HDDs. To improve the cost-per-performance of the SSD cache, we introduced a selective I/O bypass (SIB) method, redirecting the automatically calculated number of read I/O requests from the SSD cache to idle HDDs when the SSDs are I/O over-saturated, which means the disk utilization is greater than 100 percent. To precisely calculate the disk utilization, we also introduced a combinational approach for SSDs because the current method used for HDDs cannot be applied to SSDs because of their internal parallelism. In our experiments, the proposed approach achieved a maximum 2x faster performance than other approaches.

Original languageEnglish
Pages (from-to)589-595
Number of pages7
JournalIEEE Transactions on Computers
Volume67
Issue number4
DOIs
Publication statusPublished - 2018 Apr 1

Fingerprint

Caching
Load Balancing
Resource allocation
Hard disk storage
Cache
Parallel I/O
Service Quality
Costs
Big data
Percent
Parallelism
Industry
Internal
Calculate
Vertex of a graph
Experiment

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

@article{338be178b1c242bcad8e6ba541a54297,
title = "Selective I/O bypass and load balancing method for write-through SSD caching in big data analytics",
abstract = "Fast network quality analysis in the telecom industry is an important method used to provide quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system consisting of a hundred nodes, each of which only contains hard disk drives (HDDs). Because the analysis process is a set of parallel I/O intensive jobs, adding solid state drives (SSDs) with appropriate settings is the most cost-efficient way to improve the performance, as shown in previous studies. Therefore, we decided to configure SSDs as a write-through cache instead of increasing the number of HDDs. To improve the cost-per-performance of the SSD cache, we introduced a selective I/O bypass (SIB) method, redirecting the automatically calculated number of read I/O requests from the SSD cache to idle HDDs when the SSDs are I/O over-saturated, which means the disk utilization is greater than 100 percent. To precisely calculate the disk utilization, we also introduced a combinational approach for SSDs because the current method used for HDDs cannot be applied to SSDs because of their internal parallelism. In our experiments, the proposed approach achieved a maximum 2x faster performance than other approaches.",
author = "Jaehyung Kim and Hongchan Roh and Park, {Sang Hyun}",
year = "2018",
month = "4",
day = "1",
doi = "10.1109/TC.2017.2771491",
language = "English",
volume = "67",
pages = "589--595",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "4",

}

Selective I/O bypass and load balancing method for write-through SSD caching in big data analytics. / Kim, Jaehyung; Roh, Hongchan; Park, Sang Hyun.

In: IEEE Transactions on Computers, Vol. 67, No. 4, 01.04.2018, p. 589-595.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Selective I/O bypass and load balancing method for write-through SSD caching in big data analytics

AU - Kim, Jaehyung

AU - Roh, Hongchan

AU - Park, Sang Hyun

PY - 2018/4/1

Y1 - 2018/4/1

N2 - Fast network quality analysis in the telecom industry is an important method used to provide quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system consisting of a hundred nodes, each of which only contains hard disk drives (HDDs). Because the analysis process is a set of parallel I/O intensive jobs, adding solid state drives (SSDs) with appropriate settings is the most cost-efficient way to improve the performance, as shown in previous studies. Therefore, we decided to configure SSDs as a write-through cache instead of increasing the number of HDDs. To improve the cost-per-performance of the SSD cache, we introduced a selective I/O bypass (SIB) method, redirecting the automatically calculated number of read I/O requests from the SSD cache to idle HDDs when the SSDs are I/O over-saturated, which means the disk utilization is greater than 100 percent. To precisely calculate the disk utilization, we also introduced a combinational approach for SSDs because the current method used for HDDs cannot be applied to SSDs because of their internal parallelism. In our experiments, the proposed approach achieved a maximum 2x faster performance than other approaches.

AB - Fast network quality analysis in the telecom industry is an important method used to provide quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system consisting of a hundred nodes, each of which only contains hard disk drives (HDDs). Because the analysis process is a set of parallel I/O intensive jobs, adding solid state drives (SSDs) with appropriate settings is the most cost-efficient way to improve the performance, as shown in previous studies. Therefore, we decided to configure SSDs as a write-through cache instead of increasing the number of HDDs. To improve the cost-per-performance of the SSD cache, we introduced a selective I/O bypass (SIB) method, redirecting the automatically calculated number of read I/O requests from the SSD cache to idle HDDs when the SSDs are I/O over-saturated, which means the disk utilization is greater than 100 percent. To precisely calculate the disk utilization, we also introduced a combinational approach for SSDs because the current method used for HDDs cannot be applied to SSDs because of their internal parallelism. In our experiments, the proposed approach achieved a maximum 2x faster performance than other approaches.

UR - http://www.scopus.com/inward/record.url?scp=85033685248&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85033685248&partnerID=8YFLogxK

U2 - 10.1109/TC.2017.2771491

DO - 10.1109/TC.2017.2771491

M3 - Article

AN - SCOPUS:85033685248

VL - 67

SP - 589

EP - 595

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 4

ER -