As a massive linked open data is available in RDF, the scalable storage and efficient retrieval using MapReduce have been actively studied. Most of previous researches focus on reducing the number of MapReduce jobs for processing join operations in SPARQL queries. However, the cost of shuffle phase still occurs due to their reduce-side joins. In this paper, we propose RDFChain which supports the scalable storage and efficient retrieval of a large volume of RDF data using a combination of MapReduce and HBase which is NoSQL storage system. Since the proposed storage schema of RDFChain reflects all the possible join patterns of queries, it provides a reduced number of storage accesses depending on the join pattern of a query. In addition, the proposed cost-based map-side join of RDFChain reduces the number of map jobs since it processes as many joins as possible in a map job using statistics.
|Number of pages||4|
|Journal||CEUR Workshop Proceedings|
|Publication status||Published - 2013 Jan 1|
|Event||12th International Semantic Web Conference, ISWC 2013 - Sydney, Australia|
Duration: 2013 Oct 23 → …
All Science Journal Classification (ASJC) codes
- Computer Science(all)