Efficiently tracing clusters over high-dimensional on-line data streams

Jae Woo Lee, Nam Hun Park, Won Suk Lee

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

A good clustering method should provide flexible scalability on the number of dimensions as well as the size of a data set. This paper proposes a method of efficiently tracing the clusters of a high-dimensional on-line data stream. While tracing the one-dimensional clusters of each dimension independently, a technique which is similar to frequent itemset mining is employed to find the set of multi-dimensional clusters. By finding a frequently co-occurred set of one-dimensional clusters, it is possible to trace a multi-dimensional rectangular space whose range is defined by the one-dimensional clusters collectively. In order to trace such candidates over a multi-dimensional online data stream, a cluster-statistics tree (CS-Tree) is proposed in this paper. A k-depth node(k ≤ d) in the CS-tree is corresponding to a k-dimensional rectangular space. Each node keeps track of the density of data elements in its corresponding rectangular space. Only a node corresponding to a dense rectangular space is allowed to have a child node. The scalability on the number of dimensions is greatly enhanced while sacrificing the accuracy of identified clusters slightly.

Original languageEnglish
Pages (from-to)362-379
Number of pages18
JournalData and Knowledge Engineering
Volume68
Issue number3
DOIs
Publication statusPublished - 2009 Mar 1

All Science Journal Classification (ASJC) codes

  • Information Systems and Management

Fingerprint Dive into the research topics of 'Efficiently tracing clusters over high-dimensional on-line data streams'. Together they form a unique fingerprint.

  • Cite this