Effect of count estimation in finding frequent itemsets over online transactional data streams

Joong Hyuk Chang, Won Suk Lee

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.

Original languageEnglish
Pages (from-to)63-69
Number of pages7
JournalJournal of Computer Science and Technology
Volume20
Issue number1
DOIs
Publication statusPublished - 2005 Jan 1

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Effect of count estimation in finding frequent itemsets over online transactional data streams'. Together they form a unique fingerprint.

  • Cite this