Replica parallelism to utilize the granularity of data

Won Gi Choi, Sang Hyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the volume of relational data is increased significantly, big data technologies have been noticed for recent years. Hadoop File System (HDFS) [14] is a basis of several big data systems and enables large data sets to be stored across the big data environment which is composed of many computers. HDFS divides large data into several blocks and each block is distributed and stored in a computer. To support reliability of data, HDFS replicates the data block. Generally, HDFS provides high-throughput when a client accesses to data. However, the architecture of HDFS is mainly designed to process data whose pattern is large and sequential. A data input whose pattern is small and random is not appropriate for applying HDFS. The data can result in several weak points of HDFS in terms of performance. HBase [2], one of Hadoop eco-systems, is a distributed data store which can process random, small read/write data efficiently. HBase utilizes HDFS structure but the block size of HBase is smaller than one of HDFS and a file of HBase is composed by blocks which is arranged by index structure. As many softwares which are related with big data science have emerged, many researches improving the system performance also have emerged steadily.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Emerging Databases
Subtitle of host publicationTechnologies, Applications, and Theory, EDB 2016
EditorsCarson K. Leung
PublisherAssociation for Computing Machinery
Pages35-42
Number of pages8
ISBN (Electronic)9781450347549
DOIs
Publication statusPublished - 2016 Oct 17
Event6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016 - Jeju Island, Korea, Republic of
Duration: 2016 Oct 172016 Oct 19

Publication series

NameACM International Conference Proceeding Series

Other

Other6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016
CountryKorea, Republic of
CityJeju Island
Period16/10/1716/10/19

Fingerprint

Random processes
Throughput
Big data

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Choi, W. G., & Park, S. H. (2016). Replica parallelism to utilize the granularity of data. In C. K. Leung (Ed.), Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016 (pp. 35-42). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3007818.3007835
Choi, Won Gi ; Park, Sang Hyun. / Replica parallelism to utilize the granularity of data. Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016. editor / Carson K. Leung. Association for Computing Machinery, 2016. pp. 35-42 (ACM International Conference Proceeding Series).
@inproceedings{1a654026c679491184a21941679d7900,
title = "Replica parallelism to utilize the granularity of data",
abstract = "As the volume of relational data is increased significantly, big data technologies have been noticed for recent years. Hadoop File System (HDFS) [14] is a basis of several big data systems and enables large data sets to be stored across the big data environment which is composed of many computers. HDFS divides large data into several blocks and each block is distributed and stored in a computer. To support reliability of data, HDFS replicates the data block. Generally, HDFS provides high-throughput when a client accesses to data. However, the architecture of HDFS is mainly designed to process data whose pattern is large and sequential. A data input whose pattern is small and random is not appropriate for applying HDFS. The data can result in several weak points of HDFS in terms of performance. HBase [2], one of Hadoop eco-systems, is a distributed data store which can process random, small read/write data efficiently. HBase utilizes HDFS structure but the block size of HBase is smaller than one of HDFS and a file of HBase is composed by blocks which is arranged by index structure. As many softwares which are related with big data science have emerged, many researches improving the system performance also have emerged steadily.",
author = "Choi, {Won Gi} and Park, {Sang Hyun}",
year = "2016",
month = "10",
day = "17",
doi = "10.1145/3007818.3007835",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
pages = "35--42",
editor = "Leung, {Carson K.}",
booktitle = "Proceedings of the 6th International Conference on Emerging Databases",

}

Choi, WG & Park, SH 2016, Replica parallelism to utilize the granularity of data. in CK Leung (ed.), Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 35-42, 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016, Jeju Island, Korea, Republic of, 16/10/17. https://doi.org/10.1145/3007818.3007835

Replica parallelism to utilize the granularity of data. / Choi, Won Gi; Park, Sang Hyun.

Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016. ed. / Carson K. Leung. Association for Computing Machinery, 2016. p. 35-42 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Replica parallelism to utilize the granularity of data

AU - Choi, Won Gi

AU - Park, Sang Hyun

PY - 2016/10/17

Y1 - 2016/10/17

N2 - As the volume of relational data is increased significantly, big data technologies have been noticed for recent years. Hadoop File System (HDFS) [14] is a basis of several big data systems and enables large data sets to be stored across the big data environment which is composed of many computers. HDFS divides large data into several blocks and each block is distributed and stored in a computer. To support reliability of data, HDFS replicates the data block. Generally, HDFS provides high-throughput when a client accesses to data. However, the architecture of HDFS is mainly designed to process data whose pattern is large and sequential. A data input whose pattern is small and random is not appropriate for applying HDFS. The data can result in several weak points of HDFS in terms of performance. HBase [2], one of Hadoop eco-systems, is a distributed data store which can process random, small read/write data efficiently. HBase utilizes HDFS structure but the block size of HBase is smaller than one of HDFS and a file of HBase is composed by blocks which is arranged by index structure. As many softwares which are related with big data science have emerged, many researches improving the system performance also have emerged steadily.

AB - As the volume of relational data is increased significantly, big data technologies have been noticed for recent years. Hadoop File System (HDFS) [14] is a basis of several big data systems and enables large data sets to be stored across the big data environment which is composed of many computers. HDFS divides large data into several blocks and each block is distributed and stored in a computer. To support reliability of data, HDFS replicates the data block. Generally, HDFS provides high-throughput when a client accesses to data. However, the architecture of HDFS is mainly designed to process data whose pattern is large and sequential. A data input whose pattern is small and random is not appropriate for applying HDFS. The data can result in several weak points of HDFS in terms of performance. HBase [2], one of Hadoop eco-systems, is a distributed data store which can process random, small read/write data efficiently. HBase utilizes HDFS structure but the block size of HBase is smaller than one of HDFS and a file of HBase is composed by blocks which is arranged by index structure. As many softwares which are related with big data science have emerged, many researches improving the system performance also have emerged steadily.

UR - http://www.scopus.com/inward/record.url?scp=85018359819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018359819&partnerID=8YFLogxK

U2 - 10.1145/3007818.3007835

DO - 10.1145/3007818.3007835

M3 - Conference contribution

AN - SCOPUS:85018359819

T3 - ACM International Conference Proceeding Series

SP - 35

EP - 42

BT - Proceedings of the 6th International Conference on Emerging Databases

A2 - Leung, Carson K.

PB - Association for Computing Machinery

ER -

Choi WG, Park SH. Replica parallelism to utilize the granularity of data. In Leung CK, editor, Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016. Association for Computing Machinery. 2016. p. 35-42. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3007818.3007835