### Abstract

This paper proposes a grid-based clustering method that dynamically partitions the range of a grid-cell based on its distribution statistics of data elements in a data stream. Initially the multi-dimensional space of a data domain is partitioned into a set of mutually exclusive equal-size initial cells. As a new data element is generated continuously, each cell monitors the distribution statistics of data elements within its range. When the support of data elements in a cell becomes high enough, the cell is dynamically divided into two mutually exclusive smaller cells called intermediate cells by assuming the distribution of data elements is a normal distribution. Eventually, the dense sub-range of an initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. In order to minimize the number of cells, a sparse intermediate or unit cell can be pruned if its support becomes much less than a minimum support. The performance of the proposed method is comparatively analyzed through a series of experiments.

Original language | English |
---|---|

Pages (from-to) | 387-398 |

Number of pages | 12 |

Journal | Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) |

Volume | 2838 |

Publication status | Published - 2003 Dec 1 |

Event | 7th European Conference on Principles and Practice of Knowledge Discovery in Databases - Cavtat-Dubrovnik, Croatia Duration: 2003 Sep 22 → 2003 Sep 26 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Computer Science(all)

### Cite this

}

*Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)*, vol. 2838, pp. 387-398.

**Statistical σ-partition clustering over data streams.** / Park, Nam Hun; Lee, Won Suk.

Research output: Contribution to journal › Conference article

TY - JOUR

T1 - Statistical σ-partition clustering over data streams

AU - Park, Nam Hun

AU - Lee, Won Suk

PY - 2003/12/1

Y1 - 2003/12/1

N2 - This paper proposes a grid-based clustering method that dynamically partitions the range of a grid-cell based on its distribution statistics of data elements in a data stream. Initially the multi-dimensional space of a data domain is partitioned into a set of mutually exclusive equal-size initial cells. As a new data element is generated continuously, each cell monitors the distribution statistics of data elements within its range. When the support of data elements in a cell becomes high enough, the cell is dynamically divided into two mutually exclusive smaller cells called intermediate cells by assuming the distribution of data elements is a normal distribution. Eventually, the dense sub-range of an initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. In order to minimize the number of cells, a sparse intermediate or unit cell can be pruned if its support becomes much less than a minimum support. The performance of the proposed method is comparatively analyzed through a series of experiments.

AB - This paper proposes a grid-based clustering method that dynamically partitions the range of a grid-cell based on its distribution statistics of data elements in a data stream. Initially the multi-dimensional space of a data domain is partitioned into a set of mutually exclusive equal-size initial cells. As a new data element is generated continuously, each cell monitors the distribution statistics of data elements within its range. When the support of data elements in a cell becomes high enough, the cell is dynamically divided into two mutually exclusive smaller cells called intermediate cells by assuming the distribution of data elements is a normal distribution. Eventually, the dense sub-range of an initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. In order to minimize the number of cells, a sparse intermediate or unit cell can be pruned if its support becomes much less than a minimum support. The performance of the proposed method is comparatively analyzed through a series of experiments.

UR - http://www.scopus.com/inward/record.url?scp=9444258641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=9444258641&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:9444258641

VL - 2838

SP - 387

EP - 398

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -