Replica parallelism to utilize the granularity of data

Won Gi Choi, Sanghyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the volume of relational data is increased significantly, big data technologies have been noticed for recent years. Hadoop File System (HDFS) [14] is a basis of several big data systems and enables large data sets to be stored across the big data environment which is composed of many computers. HDFS divides large data into several blocks and each block is distributed and stored in a computer. To support reliability of data, HDFS replicates the data block. Generally, HDFS provides high-throughput when a client accesses to data. However, the architecture of HDFS is mainly designed to process data whose pattern is large and sequential. A data input whose pattern is small and random is not appropriate for applying HDFS. The data can result in several weak points of HDFS in terms of performance. HBase [2], one of Hadoop eco-systems, is a distributed data store which can process random, small read/write data efficiently. HBase utilizes HDFS structure but the block size of HBase is smaller than one of HDFS and a file of HBase is composed by blocks which is arranged by index structure. As many softwares which are related with big data science have emerged, many researches improving the system performance also have emerged steadily.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Emerging Databases
Subtitle of host publicationTechnologies, Applications, and Theory, EDB 2016
EditorsCarson K. Leung
PublisherAssociation for Computing Machinery
Pages35-42
Number of pages8
ISBN (Electronic)9781450347549
DOIs
Publication statusPublished - 2016 Oct 17
Event6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016 - Jeju Island, Korea, Republic of
Duration: 2016 Oct 172016 Oct 19

Publication series

NameACM International Conference Proceeding Series

Other

Other6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016
CountryKorea, Republic of
CityJeju Island
Period16/10/1716/10/19

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Choi, W. G., & Park, S. (2016). Replica parallelism to utilize the granularity of data. In C. K. Leung (Ed.), Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory, EDB 2016 (pp. 35-42). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3007818.3007835