Privacy preserving data mining of sequential patterns for network traffic data

Seung Woo Kim, Sang Hyun Park, Jung Im Won, Sang Wook Kim

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users' privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.

Original languageEnglish
Pages (from-to)694-713
Number of pages20
JournalInformation Sciences
Volume178
Issue number3
DOIs
Publication statusPublished - 2008 Feb 1

Fingerprint

Privacy Preserving Data Mining
Sequential Patterns
Network Traffic
Data mining
Servers
Internet
Privacy
Process Mining
Traffic
Mining
Server
Addition method
Frequent Pattern
Worm
Experiments
Repository
Accelerate
Replacement
Tables
Correctness

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Kim, Seung Woo ; Park, Sang Hyun ; Won, Jung Im ; Kim, Sang Wook. / Privacy preserving data mining of sequential patterns for network traffic data. In: Information Sciences. 2008 ; Vol. 178, No. 3. pp. 694-713.
@article{d0cbf20596524652bee8c1706cef8162,
title = "Privacy preserving data mining of sequential patterns for network traffic data",
abstract = "As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users' privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.",
author = "Kim, {Seung Woo} and Park, {Sang Hyun} and Won, {Jung Im} and Kim, {Sang Wook}",
year = "2008",
month = "2",
day = "1",
doi = "10.1016/j.ins.2007.08.022",
language = "English",
volume = "178",
pages = "694--713",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "3",

}

Privacy preserving data mining of sequential patterns for network traffic data. / Kim, Seung Woo; Park, Sang Hyun; Won, Jung Im; Kim, Sang Wook.

In: Information Sciences, Vol. 178, No. 3, 01.02.2008, p. 694-713.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Privacy preserving data mining of sequential patterns for network traffic data

AU - Kim, Seung Woo

AU - Park, Sang Hyun

AU - Won, Jung Im

AU - Kim, Sang Wook

PY - 2008/2/1

Y1 - 2008/2/1

N2 - As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users' privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.

AB - As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users' privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.

UR - http://www.scopus.com/inward/record.url?scp=35748979448&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35748979448&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2007.08.022

DO - 10.1016/j.ins.2007.08.022

M3 - Article

AN - SCOPUS:35748979448

VL - 178

SP - 694

EP - 713

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 3

ER -