CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Se Jung Shin, Dae Su Lee, Won Suk Lee

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

Original languageEnglish
Pages (from-to)559-576
Number of pages18
JournalInformation sciences
Volume278
DOIs
Publication statusPublished - 2014 Sep 10

Fingerprint

Frequent Itemsets
Prefix
Data Streams
Data storage equipment
Merging
Data mining
Vertex of a graph
Data streams
Trace
False Positive
Data Mining

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

@article{8a7bdd1efa6241f182402d65ddab2561,
title = "CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams",
abstract = "Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.",
author = "Shin, {Se Jung} and Lee, {Dae Su} and Lee, {Won Suk}",
year = "2014",
month = "9",
day = "10",
doi = "10.1016/j.ins.2014.03.074",
language = "English",
volume = "278",
pages = "559--576",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

CP-tree : An adaptive synopsis structure for compressing frequent itemsets over online data streams. / Shin, Se Jung; Lee, Dae Su; Lee, Won Suk.

In: Information sciences, Vol. 278, 10.09.2014, p. 559-576.

Research output: Contribution to journalArticle

TY - JOUR

T1 - CP-tree

T2 - An adaptive synopsis structure for compressing frequent itemsets over online data streams

AU - Shin, Se Jung

AU - Lee, Dae Su

AU - Lee, Won Suk

PY - 2014/9/10

Y1 - 2014/9/10

N2 - Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

AB - Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

UR - http://www.scopus.com/inward/record.url?scp=84901833382&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901833382&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2014.03.074

DO - 10.1016/j.ins.2014.03.074

M3 - Article

AN - SCOPUS:84901833382

VL - 278

SP - 559

EP - 576

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -