Physical database design for efficient time-series similarity search

Sang Wook Kim, Jinho Kim, Sanghyun Park

Research output: Contribution to journalArticle

Abstract

Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

Original languageEnglish
Pages (from-to)1251-1254
Number of pages4
JournalIEICE Transactions on Communications
VolumeE91-B
Issue number4
DOIs
Publication statusPublished - 2008 Jan 1

Fingerprint

Discrete Fourier transforms
Time series
Processing
Costs
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Cite this

@article{6b479d9b7b8f4f48a94b26104bc8458b,
title = "Physical database design for efficient time-series similarity search",
abstract = "Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.",
author = "Kim, {Sang Wook} and Jinho Kim and Sanghyun Park",
year = "2008",
month = "1",
day = "1",
doi = "10.1093/ietcom/e91-b.4.1251",
language = "English",
volume = "E91-B",
pages = "1251--1254",
journal = "IEICE Transactions on Communications",
issn = "0916-8516",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "4",

}

Physical database design for efficient time-series similarity search. / Kim, Sang Wook; Kim, Jinho; Park, Sanghyun.

In: IEICE Transactions on Communications, Vol. E91-B, No. 4, 01.01.2008, p. 1251-1254.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Physical database design for efficient time-series similarity search

AU - Kim, Sang Wook

AU - Kim, Jinho

AU - Park, Sanghyun

PY - 2008/1/1

Y1 - 2008/1/1

N2 - Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

AB - Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

UR - http://www.scopus.com/inward/record.url?scp=67651015928&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67651015928&partnerID=8YFLogxK

U2 - 10.1093/ietcom/e91-b.4.1251

DO - 10.1093/ietcom/e91-b.4.1251

M3 - Article

AN - SCOPUS:67651015928

VL - E91-B

SP - 1251

EP - 1254

JO - IEICE Transactions on Communications

JF - IEICE Transactions on Communications

SN - 0916-8516

IS - 4

ER -