VIP Hashing-Adapting to Skew in Popularity of Data on the Fly

Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, Kwanghyun Park

    Research output: Contribution to journalConference articlepeer-review


    All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching and code complexity, among other factors. These factors make learning in-the-loop challenging as the overhead of performing additional operations can have significant impact on performance. In this paper, we propose VIP hashing, a hash table method that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout on the fly. These mechanisms are non-blocking, i.e, the hash table is operational at all times. The overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the mechanisms as needed. We ran extensive tests against a host of workloads generated by Wiscer, a homegrown benchmarking tool, and we find that VIP hashing improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with 1M keys under low skew) while adapting to insert and delete operations, and changing popularity distribution of keys on the fly. Our experiments on DuckDB show that VIP hashing reduces the end-to-end execution time of TPC-H query 9 by 20% under low skew.

    Original languageEnglish
    Pages (from-to)1978-1990
    Number of pages13
    JournalProceedings of the VLDB Endowment
    Issue number10
    Publication statusPublished - 2022
    Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
    Duration: 2022 Sept 52022 Sept 9

    Bibliographical note

    Funding Information:
    This research was supported in part by a grant from the Microsoft Jim Gray Systems Lab, by the National Science Foundation under grant OAC-1835446, and by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program.

    Publisher Copyright:
    © 2022, VLDB Endowment., All rights reserved.

    All Science Journal Classification (ASJC) codes

    • Computer Science (miscellaneous)
    • Computer Science(all)


    Dive into the research topics of 'VIP Hashing-Adapting to Skew in Popularity of Data on the Fly'. Together they form a unique fingerprint.

    Cite this