RDFChain: Chain centric storage for scalable join processing of RDF graphs using mapreduce and HBase

Pilsik Choi, Jooik Jung, Kyong Ho Lee

Research output: Contribution to journalConference article

10 Citations (Scopus)

Abstract

As a massive linked open data is available in RDF, the scalable storage and efficient retrieval using MapReduce have been actively studied. Most of previous researches focus on reducing the number of MapReduce jobs for processing join operations in SPARQL queries. However, the cost of shuffle phase still occurs due to their reduce-side joins. In this paper, we propose RDFChain which supports the scalable storage and efficient retrieval of a large volume of RDF data using a combination of MapReduce and HBase which is NoSQL storage system. Since the proposed storage schema of RDFChain reflects all the possible join patterns of queries, it provides a reduced number of storage accesses depending on the join pattern of a query. In addition, the proposed cost-based map-side join of RDFChain reduces the number of map jobs since it processes as many joins as possible in a map job using statistics.

Original languageEnglish
Pages (from-to)249-252
Number of pages4
JournalCEUR Workshop Proceedings
Volume1035
Publication statusPublished - 2013 Jan 1
Event12th International Semantic Web Conference, ISWC 2013 - Sydney, Australia
Duration: 2013 Oct 23 → …

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this