Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams

Nam Hun Park, Won Suk Lee

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

To effectively trace the clusters of recently generated data elements in an on-line data stream, a sibling list and a cell tree are proposed in this paper. Initially, the multi-dimensional data space of a data stream is partitioned into mutually exclusive equal-sized grid-cells. Each grid-cell monitors the recent distribution statistics of data elements within its range. The old distribution statistics of each grid-cell are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. Given a partitioning factor h, a dense grid-cell is partitioned into h equal-size smaller grid-cells. Such partitioning is continued until a grid-cell becomes the smallest one called a unit grid-cell. Conversely, a set of consecutive sparse grid-cells can be merged into a single grid-cell. A sibling list is a structure to manage the set of all grid-cells in a one-dimensional data space and it acts as an index for locating a specific grid-cell. Upon creating a dense unit grid-cell on a one-dimensional data space, a new sibling list for another dimension is created as a child of the grid-cell. In such a way, a cell tree is created. By repeating this process, a multi-dimensional dense unit grid-cell is identified by a path of a cell tree. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.

Original languageEnglish
Pages (from-to)528-549
Number of pages22
JournalData and Knowledge Engineering
Volume63
Issue number2
DOIs
Publication statusPublished - 2007 Nov 1

All Science Journal Classification (ASJC) codes

  • Information Systems and Management

Fingerprint Dive into the research topics of 'Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams'. Together they form a unique fingerprint.

  • Cite this