Fast retrieval of similar subsequences in long sequence databases

Sanghyun Park, Dongwon Lee, Wesley W. Chu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

56 Citations (Scopus)

Abstract

Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the average length L~ of data sequences. We propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to linear to L~. We also present an indexing technique to speed-up the aligned subsequence matching using the similarity measure of the modified time warping distance. Experiments on synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed sequential scanning and achieved an up to 6.5 times speed-up.

Original languageEnglish
Title of host publicationProceedings - 1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999
EditorsPeter Scheuermann
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages60-67
Number of pages8
ISBN (Electronic)0769504531, 9780769504537
DOIs
Publication statusPublished - 1999 Jan 1
Event1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999 - Chicago, United States
Duration: 1999 Nov 7 → …

Publication series

NameProceedings - 1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999

Other

Other1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999
CountryUnited States
CityChicago
Period99/11/7 → …

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Fast retrieval of similar subsequences in long sequence databases'. Together they form a unique fingerprint.

  • Cite this

    Park, S., Lee, D., & Chu, W. W. (1999). Fast retrieval of similar subsequences in long sequence databases. In P. Scheuermann (Ed.), Proceedings - 1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999 (pp. 60-67). [836610] (Proceedings - 1999 Workshop on Knowledge and Data Engineering Exchange, KDEX 1999). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/KDEX.1999.836610