Robust distributed indexing for locality-skewed workloads

Mu Woong Lee, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000%. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7%. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.

Original languageEnglish
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages1342-1351
Number of pages10
DOIs
Publication statusPublished - 2012 Dec 19
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: 2012 Oct 292012 Nov 2

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period12/10/2912/11/2

Fingerprint

Hurricanes
Disasters
Resource allocation
Costs
Communication

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Lee, M. W., & Hwang, S. W. (2012). Robust distributed indexing for locality-skewed workloads. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 1342-1351) https://doi.org/10.1145/2396761.2398438
Lee, Mu Woong ; Hwang, Seung Won. / Robust distributed indexing for locality-skewed workloads. CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. pp. 1342-1351
@inproceedings{ec2ae9401b0545b9a0bf474cae70feb4,
title = "Robust distributed indexing for locality-skewed workloads",
abstract = "Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000{\%}. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7{\%}. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.",
author = "Lee, {Mu Woong} and Hwang, {Seung Won}",
year = "2012",
month = "12",
day = "19",
doi = "10.1145/2396761.2398438",
language = "English",
isbn = "9781450311564",
pages = "1342--1351",
booktitle = "CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management",

}

Lee, MW & Hwang, SW 2012, Robust distributed indexing for locality-skewed workloads. in CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. pp. 1342-1351, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, United States, 12/10/29. https://doi.org/10.1145/2396761.2398438

Robust distributed indexing for locality-skewed workloads. / Lee, Mu Woong; Hwang, Seung Won.

CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 1342-1351.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Robust distributed indexing for locality-skewed workloads

AU - Lee, Mu Woong

AU - Hwang, Seung Won

PY - 2012/12/19

Y1 - 2012/12/19

N2 - Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000%. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7%. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.

AB - Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000%. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7%. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.

UR - http://www.scopus.com/inward/record.url?scp=84871060489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871060489&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398438

DO - 10.1145/2396761.2398438

M3 - Conference contribution

SN - 9781450311564

SP - 1342

EP - 1351

BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management

ER -

Lee MW, Hwang SW. Robust distributed indexing for locality-skewed workloads. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 1342-1351 https://doi.org/10.1145/2396761.2398438