An index-based approach for similarity search supporting time warping in large sequence databases

Sang Wook Kim, Sang Hyun Park, Wesley W. Chu

Research output: Contribution to journalArticle

201 Citations (Scopus)

Abstract

This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw-lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw-lb as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.

Original languageEnglish
Pages (from-to)607-614
Number of pages8
JournalProceedings - International Conference on Data Engineering
DOIs
Publication statusPublished - 2001 Jan 1

Fingerprint

Processing

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Cite this

@article{67f8243c19a4486883f4b600818ea9c2,
title = "An index-based approach for similarity search supporting time warping in large sequence databases",
abstract = "This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw-lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw-lb as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.",
author = "Kim, {Sang Wook} and Park, {Sang Hyun} and Chu, {Wesley W.}",
year = "2001",
month = "1",
day = "1",
doi = "10.1109/ICDE.2001.914875",
language = "English",
pages = "607--614",
journal = "Proceedings - International Conference on Data Engineering",
issn = "1084-4627",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - An index-based approach for similarity search supporting time warping in large sequence databases

AU - Kim, Sang Wook

AU - Park, Sang Hyun

AU - Chu, Wesley W.

PY - 2001/1/1

Y1 - 2001/1/1

N2 - This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw-lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw-lb as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.

AB - This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw-lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw-lb as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.

UR - http://www.scopus.com/inward/record.url?scp=0034995991&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034995991&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2001.914875

DO - 10.1109/ICDE.2001.914875

M3 - Article

SP - 607

EP - 614

JO - Proceedings - International Conference on Data Engineering

JF - Proceedings - International Conference on Data Engineering

SN - 1084-4627

ER -