Minimal probing

Supporting expensive predicates for top-K queries

Kevin Chen Chuan Chang, Seung Won Hwang

Research output: Contribution to journalConference article

179 Citations (Scopus)

Abstract

This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.

Original languageEnglish
Pages (from-to)346-357
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
Publication statusPublished - 2002 Sep 17

Fingerprint

Middleware
Sorting
Merging
Costs
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{60c14b76f7b54cfb9b16361a289b0fb8,
title = "Minimal probing: Supporting expensive predicates for top-K queries",
abstract = "This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of {"}necessary probes,{"} which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.",
author = "Chang, {Kevin Chen Chuan} and Hwang, {Seung Won}",
year = "2002",
month = "9",
day = "17",
language = "English",
pages = "346--357",
journal = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
issn = "0730-8078",
publisher = "Association for Computing Machinery (ACM)",

}

Minimal probing : Supporting expensive predicates for top-K queries. / Chang, Kevin Chen Chuan; Hwang, Seung Won.

In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 17.09.2002, p. 346-357.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Minimal probing

T2 - Supporting expensive predicates for top-K queries

AU - Chang, Kevin Chen Chuan

AU - Hwang, Seung Won

PY - 2002/9/17

Y1 - 2002/9/17

N2 - This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.

AB - This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.

UR - http://www.scopus.com/inward/record.url?scp=0036372482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036372482&partnerID=8YFLogxK

M3 - Conference article

SP - 346

EP - 357

JO - Proceedings of the ACM SIGMOD International Conference on Management of Data

JF - Proceedings of the ACM SIGMOD International Conference on Management of Data

SN - 0730-8078

ER -