Optimized combinatorial clustering for stochastic processes

Jumi Kim, Wookey Lee, Justin Jongsu Song, Soo-Bok Lee

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.

Original languageEnglish
Pages (from-to)1135-1148
Number of pages14
JournalCluster Computing
Volume20
Issue number2
DOIs
Publication statusPublished - 2017 Jun 1

Fingerprint

Random processes
Data mining
Cloud computing
Clustering algorithms
Sampling

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this

Kim, Jumi ; Lee, Wookey ; Song, Justin Jongsu ; Lee, Soo-Bok. / Optimized combinatorial clustering for stochastic processes. In: Cluster Computing. 2017 ; Vol. 20, No. 2. pp. 1135-1148.
@article{f9faa211e49449e8817698b1a432375e,
title = "Optimized combinatorial clustering for stochastic processes",
abstract = "As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.",
author = "Jumi Kim and Wookey Lee and Song, {Justin Jongsu} and Soo-Bok Lee",
year = "2017",
month = "6",
day = "1",
doi = "10.1007/s10586-017-0763-1",
language = "English",
volume = "20",
pages = "1135--1148",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "2",

}

Optimized combinatorial clustering for stochastic processes. / Kim, Jumi; Lee, Wookey; Song, Justin Jongsu; Lee, Soo-Bok.

In: Cluster Computing, Vol. 20, No. 2, 01.06.2017, p. 1135-1148.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Optimized combinatorial clustering for stochastic processes

AU - Kim, Jumi

AU - Lee, Wookey

AU - Song, Justin Jongsu

AU - Lee, Soo-Bok

PY - 2017/6/1

Y1 - 2017/6/1

N2 - As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.

AB - As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.

UR - http://www.scopus.com/inward/record.url?scp=85013448323&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013448323&partnerID=8YFLogxK

U2 - 10.1007/s10586-017-0763-1

DO - 10.1007/s10586-017-0763-1

M3 - Article

AN - SCOPUS:85013448323

VL - 20

SP - 1135

EP - 1148

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 2

ER -