Identifying Recurrent and Unknown Performance Issues

Meng Hui Lim, Jian Guang Lou, Hongyu Zhang, Qiang Fu, Beng Jin Teoh, Qingwei Lin, Rui Ding, Dongmei Zhang

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

Original languageEnglish
Article number7023349
Pages (from-to)320-329
Number of pages10
JournalProceedings - IEEE International Conference on Data Mining, ICDM
Volume2015-January
Issue numberJanuary
DOIs
Publication statusPublished - 2015 Jan 26
Event14th IEEE International Conference on Data Mining, ICDM 2014 - Shenzhen, China
Duration: 2014 Dec 142014 Dec 17

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Lim, M. H., Lou, J. G., Zhang, H., Fu, Q., Teoh, B. J., Lin, Q., ... Zhang, D. (2015). Identifying Recurrent and Unknown Performance Issues. Proceedings - IEEE International Conference on Data Mining, ICDM, 2015-January(January), 320-329. [7023349]. https://doi.org/10.1109/ICDM.2014.96
Lim, Meng Hui ; Lou, Jian Guang ; Zhang, Hongyu ; Fu, Qiang ; Teoh, Beng Jin ; Lin, Qingwei ; Ding, Rui ; Zhang, Dongmei. / Identifying Recurrent and Unknown Performance Issues. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2015 ; Vol. 2015-January, No. January. pp. 320-329.
@article{61898eb3b430405288c911738e744554,
title = "Identifying Recurrent and Unknown Performance Issues",
abstract = "For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.",
author = "Lim, {Meng Hui} and Lou, {Jian Guang} and Hongyu Zhang and Qiang Fu and Teoh, {Beng Jin} and Qingwei Lin and Rui Ding and Dongmei Zhang",
year = "2015",
month = "1",
day = "26",
doi = "10.1109/ICDM.2014.96",
language = "English",
volume = "2015-January",
pages = "320--329",
journal = "Proceedings - IEEE International Conference on Data Mining, ICDM",
issn = "1550-4786",
number = "January",

}

Lim, MH, Lou, JG, Zhang, H, Fu, Q, Teoh, BJ, Lin, Q, Ding, R & Zhang, D 2015, 'Identifying Recurrent and Unknown Performance Issues', Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2015-January, no. January, 7023349, pp. 320-329. https://doi.org/10.1109/ICDM.2014.96

Identifying Recurrent and Unknown Performance Issues. / Lim, Meng Hui; Lou, Jian Guang; Zhang, Hongyu; Fu, Qiang; Teoh, Beng Jin; Lin, Qingwei; Ding, Rui; Zhang, Dongmei.

In: Proceedings - IEEE International Conference on Data Mining, ICDM, Vol. 2015-January, No. January, 7023349, 26.01.2015, p. 320-329.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Identifying Recurrent and Unknown Performance Issues

AU - Lim, Meng Hui

AU - Lou, Jian Guang

AU - Zhang, Hongyu

AU - Fu, Qiang

AU - Teoh, Beng Jin

AU - Lin, Qingwei

AU - Ding, Rui

AU - Zhang, Dongmei

PY - 2015/1/26

Y1 - 2015/1/26

N2 - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

AB - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

UR - http://www.scopus.com/inward/record.url?scp=84936948357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936948357&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2014.96

DO - 10.1109/ICDM.2014.96

M3 - Conference article

VL - 2015-January

SP - 320

EP - 329

JO - Proceedings - IEEE International Conference on Data Mining, ICDM

JF - Proceedings - IEEE International Conference on Data Mining, ICDM

SN - 1550-4786

IS - January

M1 - 7023349

ER -