All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching and code complexity, among other factors. These factors make learning in-the-loop challenging as the overhead of performing additional operations can have significant impact on performance. In this paper, we propose VIP hashing, a hash table method that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout on the fly. These mechanisms are non-blocking, i.e, the hash table is operational at all times. The overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the mechanisms as needed. We ran extensive tests against a host of workloads generated by Wiscer, a homegrown benchmarking tool, and we find that VIP hashing improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with 1M keys under low skew) while adapting to insert and delete operations, and changing popularity distribution of keys on the fly. Our experiments on DuckDB show that VIP hashing reduces the end-to-end execution time of TPC-H query 9 by 20% under low skew.
|Number of pages||13|
|Journal||Proceedings of the VLDB Endowment|
|Publication status||Published - 2022|
|Event||48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia|
Duration: 2022 Sept 5 → 2022 Sept 9
Bibliographical noteFunding Information:
This research was supported in part by a grant from the Microsoft Jim Gray Systems Lab, by the National Science Foundation under grant OAC-1835446, and by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program.
© 2022, VLDB Endowment., All rights reserved.
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Computer Science(all)