CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Se Jung Shin, Dae Su Lee, Won Suk Lee

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

Original languageEnglish
Pages (from-to)559-576
Number of pages18
JournalInformation sciences
Volume278
DOIs
Publication statusPublished - 2014 Sep 10

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams'. Together they form a unique fingerprint.

  • Cite this