Exact indexing for support vector machines

Hwanjo Yu, Ilhwan Ko, Youngdae Kim, Seungwon Hwang, Wook Shin Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1∼5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.

Original languageEnglish
Title of host publicationProceedings of SIGMOD 2011 and PODS 2011
Pages709-720
Number of pages12
DOIs
Publication statusPublished - 2011 Jul 11
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: 2011 Jun 122011 Jun 16

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2011 ACM SIGMOD and 30th PODS 2011 Conference
CountryGreece
CityAthens
Period11/6/1211/6/16

Fingerprint

Support vector machines
Query processing
Search engines
Learning systems
Feedback
Processing
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Yu, H., Ko, I., Kim, Y., Hwang, S., & Han, W. S. (2011). Exact indexing for support vector machines. In Proceedings of SIGMOD 2011 and PODS 2011 (pp. 709-720). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1989323.1989398
Yu, Hwanjo ; Ko, Ilhwan ; Kim, Youngdae ; Hwang, Seungwon ; Han, Wook Shin. / Exact indexing for support vector machines. Proceedings of SIGMOD 2011 and PODS 2011. 2011. pp. 709-720 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{daaa03e634bf4e3d8e758c3092fd0820,
title = "Exact indexing for support vector machines",
abstract = "SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1∼5{\%} of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.",
author = "Hwanjo Yu and Ilhwan Ko and Youngdae Kim and Seungwon Hwang and Han, {Wook Shin}",
year = "2011",
month = "7",
day = "11",
doi = "10.1145/1989323.1989398",
language = "English",
isbn = "9781450306614",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
pages = "709--720",
booktitle = "Proceedings of SIGMOD 2011 and PODS 2011",

}

Yu, H, Ko, I, Kim, Y, Hwang, S & Han, WS 2011, Exact indexing for support vector machines. in Proceedings of SIGMOD 2011 and PODS 2011. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 709-720, 2011 ACM SIGMOD and 30th PODS 2011 Conference, Athens, Greece, 11/6/12. https://doi.org/10.1145/1989323.1989398

Exact indexing for support vector machines. / Yu, Hwanjo; Ko, Ilhwan; Kim, Youngdae; Hwang, Seungwon; Han, Wook Shin.

Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 709-720 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Exact indexing for support vector machines

AU - Yu, Hwanjo

AU - Ko, Ilhwan

AU - Kim, Youngdae

AU - Hwang, Seungwon

AU - Han, Wook Shin

PY - 2011/7/11

Y1 - 2011/7/11

N2 - SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1∼5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.

AB - SVM (Support Vector Machine) is a well-established machine learning methodology popularly used for classification, regression, and ranking. Recently SVM has been actively researched for rank learning and applied to various applications including search engines or relevance feedback systems. A query in such systems is the ranking function F learned by SVM. Once learning a function F or formulating the query, processing the query to find top-k results requires evaluating the entire database by F. So far, there exists no exact indexing solution for SVM functions. Existing top-k query processing algorithms are not applicable to the machine-learned ranking functions, as they often make restrictive assumptions on the query, such as linearity or monotonicity of functions. Existing metric-based or reference-based indexing methods are also not applicable, because data points are invisible in the kernel space (SVM feature space) on which the index must be built. Existing kernel indexing methods return approximate results or fix kernel parameters. This paper proposes an exact indexing solution for SVM functions with varying kernel parameters. We first propose key geometric properties of the kernel space - ranking instability and ordering stability - which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1∼5% of evaluation ratio on large data sets. According to our best knowledge, iKernel is the first indexing solution that finds exact top-k results of SVM functions without a full scan of data set.

UR - http://www.scopus.com/inward/record.url?scp=79960005590&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960005590&partnerID=8YFLogxK

U2 - 10.1145/1989323.1989398

DO - 10.1145/1989323.1989398

M3 - Conference contribution

SN - 9781450306614

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 709

EP - 720

BT - Proceedings of SIGMOD 2011 and PODS 2011

ER -

Yu H, Ko I, Kim Y, Hwang S, Han WS. Exact indexing for support vector machines. In Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 709-720. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1989323.1989398