Similarity search of time-warped subsequences via a suffix tree

Sang Hyun Park, Wesley W. Chu, Jeehee Yoon, Jungim Won

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

This paper proposes an indexing technique for fast retrieval of similar subsequences using the time-warping distance. The time-warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.

Original languageEnglish
Pages (from-to)867-883
Number of pages17
JournalInformation Systems
Volume28
Issue number7
DOIs
Publication statusPublished - 2003 Jan 1

Fingerprint

Query processing
Sampling
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Park, Sang Hyun ; Chu, Wesley W. ; Yoon, Jeehee ; Won, Jungim. / Similarity search of time-warped subsequences via a suffix tree. In: Information Systems. 2003 ; Vol. 28, No. 7. pp. 867-883.
@article{a92e1480cb99433c9dc4c0fe5bb20428,
title = "Similarity search of time-warped subsequences via a suffix tree",
abstract = "This paper proposes an indexing technique for fast retrieval of similar subsequences using the time-warping distance. The time-warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.",
author = "Park, {Sang Hyun} and Chu, {Wesley W.} and Jeehee Yoon and Jungim Won",
year = "2003",
month = "1",
day = "1",
doi = "10.1016/S0306-4379(02)00102-3",
language = "English",
volume = "28",
pages = "867--883",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",
number = "7",

}

Similarity search of time-warped subsequences via a suffix tree. / Park, Sang Hyun; Chu, Wesley W.; Yoon, Jeehee; Won, Jungim.

In: Information Systems, Vol. 28, No. 7, 01.01.2003, p. 867-883.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Similarity search of time-warped subsequences via a suffix tree

AU - Park, Sang Hyun

AU - Chu, Wesley W.

AU - Yoon, Jeehee

AU - Won, Jungim

PY - 2003/1/1

Y1 - 2003/1/1

N2 - This paper proposes an indexing technique for fast retrieval of similar subsequences using the time-warping distance. The time-warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.

AB - This paper proposes an indexing technique for fast retrieval of similar subsequences using the time-warping distance. The time-warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.

UR - http://www.scopus.com/inward/record.url?scp=0042430582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0042430582&partnerID=8YFLogxK

U2 - 10.1016/S0306-4379(02)00102-3

DO - 10.1016/S0306-4379(02)00102-3

M3 - Article

AN - SCOPUS:0042430582

VL - 28

SP - 867

EP - 883

JO - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 7

ER -