Predictive parallelization: Taming tail latencies in web search

Myeongjae Jeon, Saehoon Kim, Seung Won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization frame-work in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

Original languageEnglish
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages253-262
Number of pages10
ISBN (Print)9781450322591
DOIs
Publication statusPublished - 2014 Jan 1
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: 2014 Jul 62014 Jul 11

Publication series

NameSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
CountryAustralia
CityGold Coast, QLD
Period14/7/614/7/11

Fingerprint

World Wide Web
Response time (computer systems)
Search engines
Servers
Data storage equipment
Processing

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Jeon, M., Kim, S., Hwang, S. W., He, Y., Elnikety, S., Cox, A. L., & Rixner, S. (2014). Predictive parallelization: Taming tail latencies in web search. In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 253-262). (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery. https://doi.org/10.1145/2600428.2609572
Jeon, Myeongjae ; Kim, Saehoon ; Hwang, Seung Won ; He, Yuxiong ; Elnikety, Sameh ; Cox, Alan L. ; Rixner, Scott. / Predictive parallelization : Taming tail latencies in web search. SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2014. pp. 253-262 (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval).
@inproceedings{0dfe593952754677ab662848ff7e3200,
title = "Predictive parallelization: Taming tail latencies in web search",
abstract = "Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization frame-work in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50{\%} (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.",
author = "Myeongjae Jeon and Saehoon Kim and Hwang, {Seung Won} and Yuxiong He and Sameh Elnikety and Cox, {Alan L.} and Scott Rixner",
year = "2014",
month = "1",
day = "1",
doi = "10.1145/2600428.2609572",
language = "English",
isbn = "9781450322591",
series = "SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval",
publisher = "Association for Computing Machinery",
pages = "253--262",
booktitle = "SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

Jeon, M, Kim, S, Hwang, SW, He, Y, Elnikety, S, Cox, AL & Rixner, S 2014, Predictive parallelization: Taming tail latencies in web search. in SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, pp. 253-262, 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia, 14/7/6. https://doi.org/10.1145/2600428.2609572

Predictive parallelization : Taming tail latencies in web search. / Jeon, Myeongjae; Kim, Saehoon; Hwang, Seung Won; He, Yuxiong; Elnikety, Sameh; Cox, Alan L.; Rixner, Scott.

SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2014. p. 253-262 (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Predictive parallelization

T2 - Taming tail latencies in web search

AU - Jeon, Myeongjae

AU - Kim, Saehoon

AU - Hwang, Seung Won

AU - He, Yuxiong

AU - Elnikety, Sameh

AU - Cox, Alan L.

AU - Rixner, Scott

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization frame-work in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

AB - Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization frame-work in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

UR - http://www.scopus.com/inward/record.url?scp=84904568618&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904568618&partnerID=8YFLogxK

U2 - 10.1145/2600428.2609572

DO - 10.1145/2600428.2609572

M3 - Conference contribution

AN - SCOPUS:84904568618

SN - 9781450322591

T3 - SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 253

EP - 262

BT - SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery

ER -

Jeon M, Kim S, Hwang SW, He Y, Elnikety S, Cox AL et al. Predictive parallelization: Taming tail latencies in web search. In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery. 2014. p. 253-262. (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2600428.2609572