TY - GEN
T1 - Grid-based subspace clustering over data streams
AU - Park, Nam Hun
AU - Lee, Won Suk
PY - 2007
Y1 - 2007
N2 - A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two- dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a l-dimensional subcluster can be found in the Kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.
AB - A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two- dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a l-dimensional subcluster can be found in the Kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.
UR - http://www.scopus.com/inward/record.url?scp=63449122793&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=63449122793&partnerID=8YFLogxK
U2 - 10.1145/1321440.1321551
DO - 10.1145/1321440.1321551
M3 - Conference contribution
AN - SCOPUS:63449122793
SN - 9781595938039
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 801
EP - 810
BT - CIKM 2007 - Proceedings of the 16th ACM Conference on Information and Knowledge Management
T2 - 16th ACM Conference on Information and Knowledge Management, CIKM 2007
Y2 - 6 November 2007 through 9 November 2007
ER -