Indexing weighted-sequences in large databases

Haixun Wang, Chang Shing Perng, Wei Fan, Sanghyun Park, Philip S. Yu

Research output: Contribution to conferencePaper

24 Citations (Scopus)

Abstract

We present an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure where each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence in that each event has a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed in this paper enables us to efficiently retrieve from the database all subsequences, possibly non-contiguous, that match a given query sequence both by events and by weights. The index method also takes into consideration the non-uniform frequency distribution of events in the sequence data. In addition, our method finds a broad range of applications in indexing scientific data consisting of multiple numerical columns for discovery of correlations among these columns. For instance, indexing a DNA micro-array that records expression levels of genes under different conditions enables us to search for genes whose responses to various experimental perturbations follow a given pattern. We demonstrate, using real-world data sets, that our method is effective and efficient.

Original languageEnglish
Pages63-74
Number of pages12
Publication statusPublished - 2003 Dec 1
EventNineteenth International Conference on Data Ingineering - Bangalore, India
Duration: 2003 Mar 52003 Mar 8

Other

OtherNineteenth International Conference on Data Ingineering
CountryIndia
CityBangalore
Period03/3/503/3/8

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Indexing weighted-sequences in large databases'. Together they form a unique fingerprint.

  • Cite this

    Wang, H., Perng, C. S., Fan, W., Park, S., & Yu, P. S. (2003). Indexing weighted-sequences in large databases. 63-74. Paper presented at Nineteenth International Conference on Data Ingineering, Bangalore, India.