Effect of count estimation in finding frequent itemsets over online transactional data streams

Joong Hyuk Chang, Won Suk Lee

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.

Original languageEnglish
Pages (from-to)63-69
Number of pages7
JournalJournal of Computer Science and Technology
Volume20
Issue number1
DOIs
Publication statusPublished - 2005 Jan 1

Fingerprint

Frequent Itemsets
Data Streams
Count
Processing
Data storage equipment
Mining
Counting
Correctness
Monitoring
Estimate

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

@article{9c6de459442443f495d868dfd4e42b2d,
title = "Effect of count estimation in finding frequent itemsets over online transactional data streams",
abstract = "A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.",
author = "Chang, {Joong Hyuk} and Lee, {Won Suk}",
year = "2005",
month = "1",
day = "1",
doi = "10.1007/s11390-005-0007-3",
language = "English",
volume = "20",
pages = "63--69",
journal = "Journal of Computer Science and Technology",
issn = "1000-9000",
publisher = "Springer New York",
number = "1",

}

Effect of count estimation in finding frequent itemsets over online transactional data streams. / Chang, Joong Hyuk; Lee, Won Suk.

In: Journal of Computer Science and Technology, Vol. 20, No. 1, 01.01.2005, p. 63-69.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Effect of count estimation in finding frequent itemsets over online transactional data streams

AU - Chang, Joong Hyuk

AU - Lee, Won Suk

PY - 2005/1/1

Y1 - 2005/1/1

N2 - A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.

AB - A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.

UR - http://www.scopus.com/inward/record.url?scp=18544366849&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=18544366849&partnerID=8YFLogxK

U2 - 10.1007/s11390-005-0007-3

DO - 10.1007/s11390-005-0007-3

M3 - Article

AN - SCOPUS:18544366849

VL - 20

SP - 63

EP - 69

JO - Journal of Computer Science and Technology

JF - Journal of Computer Science and Technology

SN - 1000-9000

IS - 1

ER -