Approximate trace of grid-based clusters over high dimensional data streams

Nam Hun Park, Won Suk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clustering in a large data set of high dimensionality has always been a serious challenge in the field of data mining. A good clustering method should provide flexible scalability to the number of dimensions as well as the size of a data set. We have proposed a grid-based clustering method called a hybrid-partition method for an on-line data stream. However, as the dimensionality of a data stream is increased, the time and space complexity of this method is increased rapidly. In this paper, a sibling list is proposed to find the clusters of a multi-dimensional data space based on the one-dimensional clusters of each dimension. Although the accuracy of identified multi-dimensional clusters may be less accurate, this one-dimensional approach can provide better scalability to the number of dimensions. This is because the one-dimensional approach requires much less memory usage than the multi-dimensional approach does. Therefore, the confined space of main memory can be more effectively utilized by the one-dimensional approach.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings
PublisherSpringer Verlag
Pages753-760
Number of pages8
ISBN (Print)9783540717003
DOIs
Publication statusPublished - 2007
Event11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007 - Nanjing, China
Duration: 2007 May 222007 May 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4426 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007
CountryChina
CityNanjing
Period07/5/2207/5/25

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Approximate trace of grid-based clusters over high dimensional data streams'. Together they form a unique fingerprint.

  • Cite this

    Park, N. H., & Lee, W. S. (2007). Approximate trace of grid-based clusters over high dimensional data streams. In Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings (pp. 753-760). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4426 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-540-71701-0_82