A new approach for processing ranked subsequence matching based on ranked union

Wook Shin Han, Jinsoo Lee, Yang Sae Moon, Seung Won Hwang, Hwanjo Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.

Original languageEnglish
Title of host publicationProceedings of SIGMOD 2011 and PODS 2011
Pages457-468
Number of pages12
DOIs
Publication statusPublished - 2011 Jul 11
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: 2011 Jun 122011 Jun 16

Other

Other2011 ACM SIGMOD and 30th PODS 2011 Conference
CountryGreece
CityAthens
Period11/6/1211/6/16

Fingerprint

Equivalence classes
Processing
Costs
Scheduling

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Han, W. S., Lee, J., Moon, Y. S., Hwang, S. W., & Yu, H. (2011). A new approach for processing ranked subsequence matching based on ranked union. In Proceedings of SIGMOD 2011 and PODS 2011 (pp. 457-468) https://doi.org/10.1145/1989323.1989371
Han, Wook Shin ; Lee, Jinsoo ; Moon, Yang Sae ; Hwang, Seung Won ; Yu, Hwanjo. / A new approach for processing ranked subsequence matching based on ranked union. Proceedings of SIGMOD 2011 and PODS 2011. 2011. pp. 457-468
@inproceedings{9f11545798dd40a195c38222eaba6103,
title = "A new approach for processing ranked subsequence matching based on ranked union",
abstract = "Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.",
author = "Han, {Wook Shin} and Jinsoo Lee and Moon, {Yang Sae} and Hwang, {Seung Won} and Hwanjo Yu",
year = "2011",
month = "7",
day = "11",
doi = "10.1145/1989323.1989371",
language = "English",
isbn = "9781450306614",
pages = "457--468",
booktitle = "Proceedings of SIGMOD 2011 and PODS 2011",

}

Han, WS, Lee, J, Moon, YS, Hwang, SW & Yu, H 2011, A new approach for processing ranked subsequence matching based on ranked union. in Proceedings of SIGMOD 2011 and PODS 2011. pp. 457-468, 2011 ACM SIGMOD and 30th PODS 2011 Conference, Athens, Greece, 11/6/12. https://doi.org/10.1145/1989323.1989371

A new approach for processing ranked subsequence matching based on ranked union. / Han, Wook Shin; Lee, Jinsoo; Moon, Yang Sae; Hwang, Seung Won; Yu, Hwanjo.

Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 457-468.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A new approach for processing ranked subsequence matching based on ranked union

AU - Han, Wook Shin

AU - Lee, Jinsoo

AU - Moon, Yang Sae

AU - Hwang, Seung Won

AU - Yu, Hwanjo

PY - 2011/7/11

Y1 - 2011/7/11

N2 - Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.

AB - Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.

UR - http://www.scopus.com/inward/record.url?scp=79959918495&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959918495&partnerID=8YFLogxK

U2 - 10.1145/1989323.1989371

DO - 10.1145/1989323.1989371

M3 - Conference contribution

AN - SCOPUS:79959918495

SN - 9781450306614

SP - 457

EP - 468

BT - Proceedings of SIGMOD 2011 and PODS 2011

ER -

Han WS, Lee J, Moon YS, Hwang SW, Yu H. A new approach for processing ranked subsequence matching based on ranked union. In Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 457-468 https://doi.org/10.1145/1989323.1989371