An inverted index is a core data structure of Information Retrieval systems, especially in search engines. Since the search environments have become more dynamic, many on-line index maintenance strategies have been proposed. Previous strategies were designed for HDDs. Consequently, in order to avoid expensive random access cost, Merge-based strategies have been preferred to In-place index update strategies on HDDs. However, flashSSDs have become solid alternatives to HDDs. FlashSSDs currently are adopted in a wide range of areas due to their superior features such as the short access latency, energy efficiency, and high bandwidth. In this article, we first reexamined potentials of In-place index update strategies on flashSSDs. Thanks to the insignificant access latency of flashSSDs, we discovered that In-place index update strategies outperform Merge-based strategies, since In-place index update strategies generate much less amount of I/O than Merge-based strategies despite inducing frequent random accesses. Based on this discovery, we suggest a new inverted index maintenance strategy based on an In-place index update strategy for flashSSDs, called Multipath Flash In-place Strategy (MFIS). To enhance the index maintenance performance, MFIS stores the posting list of each term non-contiguously and exploits the internal parallelism of flashSSDs. Thus, MFIS not only induces the minimum amount of I/O but also utilizes the maximum bandwidth of flashSSDs. Furthermore, MFIS is designed to show high query processing performance by utilizing the internal parallelism of flashSSDs even though the posting list of each term is stored non-contiguously. In our experiments, the index maintenance performance of MFIS was considerably better than other previous maintenance strategies. The index maintenance performance was up to 14.93, 4.04, 5.12, and 2.33 times higher than Merge-based strategies such as Immediate Merge, Geometric Partitioning, Hybrid, and SSD-aware Hybrid, respectively. The query processing performance of MFIS was up to 1.62 times higher than non-contiguous In-place. In addition, MFIS showed almost the best query processing performance as Merge-based strategies did. In conclusion, MFIS is the best on-line inverted index maintenance strategy on flashSSDs in terms of both index maintenance and query processing performance.
All Science Journal Classification (ASJC) codes
- Information Systems
- Hardware and Architecture