Take me to SSD

A hybrid block-selection method on HDFS based on storage type

Minkyung Kim, Mincheol Shin, Sang Hyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(∗) query and TPCH benchmark up to 22% and 30% on average.1 2345678910

Original languageEnglish
Title of host publication2016 Symposium on Applied Computing, SAC 2016
PublisherAssociation for Computing Machinery
Pages965-971
Number of pages7
Volume04-08-April-2016
ISBN (Electronic)9781450337397
DOIs
Publication statusPublished - 2016 Apr 4
Event31st Annual ACM Symposium on Applied Computing, SAC 2016 - Pisa, Italy
Duration: 2016 Apr 42016 Apr 8

Other

Other31st Annual ACM Symposium on Applied Computing, SAC 2016
CountryItaly
CityPisa
Period16/4/416/4/8

Fingerprint

Bandwidth
Big data
Experiments

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Kim, M., Shin, M., & Park, S. H. (2016). Take me to SSD: A hybrid block-selection method on HDFS based on storage type. In 2016 Symposium on Applied Computing, SAC 2016 (Vol. 04-08-April-2016, pp. 965-971). Association for Computing Machinery. https://doi.org/10.1145/2851613.2851658
Kim, Minkyung ; Shin, Mincheol ; Park, Sang Hyun. / Take me to SSD : A hybrid block-selection method on HDFS based on storage type. 2016 Symposium on Applied Computing, SAC 2016. Vol. 04-08-April-2016 Association for Computing Machinery, 2016. pp. 965-971
@inproceedings{6e98f7dac6834d2b9ed7a71575a6c807,
title = "Take me to SSD: A hybrid block-selection method on HDFS based on storage type",
abstract = "As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(∗) query and TPCH benchmark up to 22{\%} and 30{\%} on average.1 2345678910",
author = "Minkyung Kim and Mincheol Shin and Park, {Sang Hyun}",
year = "2016",
month = "4",
day = "4",
doi = "10.1145/2851613.2851658",
language = "English",
volume = "04-08-April-2016",
pages = "965--971",
booktitle = "2016 Symposium on Applied Computing, SAC 2016",
publisher = "Association for Computing Machinery",

}

Kim, M, Shin, M & Park, SH 2016, Take me to SSD: A hybrid block-selection method on HDFS based on storage type. in 2016 Symposium on Applied Computing, SAC 2016. vol. 04-08-April-2016, Association for Computing Machinery, pp. 965-971, 31st Annual ACM Symposium on Applied Computing, SAC 2016, Pisa, Italy, 16/4/4. https://doi.org/10.1145/2851613.2851658

Take me to SSD : A hybrid block-selection method on HDFS based on storage type. / Kim, Minkyung; Shin, Mincheol; Park, Sang Hyun.

2016 Symposium on Applied Computing, SAC 2016. Vol. 04-08-April-2016 Association for Computing Machinery, 2016. p. 965-971.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Take me to SSD

T2 - A hybrid block-selection method on HDFS based on storage type

AU - Kim, Minkyung

AU - Shin, Mincheol

AU - Park, Sang Hyun

PY - 2016/4/4

Y1 - 2016/4/4

N2 - As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(∗) query and TPCH benchmark up to 22% and 30% on average.1 2345678910

AB - As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(∗) query and TPCH benchmark up to 22% and 30% on average.1 2345678910

UR - http://www.scopus.com/inward/record.url?scp=84975801961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975801961&partnerID=8YFLogxK

U2 - 10.1145/2851613.2851658

DO - 10.1145/2851613.2851658

M3 - Conference contribution

VL - 04-08-April-2016

SP - 965

EP - 971

BT - 2016 Symposium on Applied Computing, SAC 2016

PB - Association for Computing Machinery

ER -

Kim M, Shin M, Park SH. Take me to SSD: A hybrid block-selection method on HDFS based on storage type. In 2016 Symposium on Applied Computing, SAC 2016. Vol. 04-08-April-2016. Association for Computing Machinery. 2016. p. 965-971 https://doi.org/10.1145/2851613.2851658