Memory efficient subspace clustering for online data streams

Nam Hun Park, Won Suk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.

Original languageEnglish
Title of host publicationProceedings of IDEAS'08
Subtitle of host publicationInternational Database Engineering and Applications Symposium
Pages199-208
Number of pages10
DOIs
Publication statusPublished - 2008 Dec 1
EventInternational Database Engineering and Applications Symposium, IDEAS'08 - Coimbra, Portugal
Duration: 2008 Sep 102008 Sep 12

Publication series

NameACM International Conference Proceeding Series
Volume299

Other

OtherInternational Database Engineering and Applications Symposium, IDEAS'08
CountryPortugal
CityCoimbra
Period08/9/1008/9/12

Fingerprint

Data storage equipment
Clustering algorithms
Scheduling
Statistics
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Park, N. H., & Lee, W. S. (2008). Memory efficient subspace clustering for online data streams. In Proceedings of IDEAS'08: International Database Engineering and Applications Symposium (pp. 199-208). (ACM International Conference Proceeding Series; Vol. 299). https://doi.org/10.1145/1451940.1451968
Park, Nam Hun ; Lee, Won Suk. / Memory efficient subspace clustering for online data streams. Proceedings of IDEAS'08: International Database Engineering and Applications Symposium. 2008. pp. 199-208 (ACM International Conference Proceeding Series).
@inproceedings{fdc0b6efa7b24d7eb738ec8bcc5d43d9,
title = "Memory efficient subspace clustering for online data streams",
abstract = "Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.",
author = "Park, {Nam Hun} and Lee, {Won Suk}",
year = "2008",
month = "12",
day = "1",
doi = "10.1145/1451940.1451968",
language = "English",
isbn = "9781605581880",
series = "ACM International Conference Proceeding Series",
pages = "199--208",
booktitle = "Proceedings of IDEAS'08",

}

Park, NH & Lee, WS 2008, Memory efficient subspace clustering for online data streams. in Proceedings of IDEAS'08: International Database Engineering and Applications Symposium. ACM International Conference Proceeding Series, vol. 299, pp. 199-208, International Database Engineering and Applications Symposium, IDEAS'08, Coimbra, Portugal, 08/9/10. https://doi.org/10.1145/1451940.1451968

Memory efficient subspace clustering for online data streams. / Park, Nam Hun; Lee, Won Suk.

Proceedings of IDEAS'08: International Database Engineering and Applications Symposium. 2008. p. 199-208 (ACM International Conference Proceeding Series; Vol. 299).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Memory efficient subspace clustering for online data streams

AU - Park, Nam Hun

AU - Lee, Won Suk

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.

AB - Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.

UR - http://www.scopus.com/inward/record.url?scp=77954443477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954443477&partnerID=8YFLogxK

U2 - 10.1145/1451940.1451968

DO - 10.1145/1451940.1451968

M3 - Conference contribution

AN - SCOPUS:77954443477

SN - 9781605581880

T3 - ACM International Conference Proceeding Series

SP - 199

EP - 208

BT - Proceedings of IDEAS'08

ER -

Park NH, Lee WS. Memory efficient subspace clustering for online data streams. In Proceedings of IDEAS'08: International Database Engineering and Applications Symposium. 2008. p. 199-208. (ACM International Conference Proceeding Series). https://doi.org/10.1145/1451940.1451968