Identifying Recurrent and Unknown Performance Issues

Meng Hui Lim, Jian Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding, Dongmei Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
EditorsRavi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages320-329
Number of pages10
EditionJanuary
ISBN (Electronic)9781479943029
DOIs
Publication statusPublished - 2014 Jan 1
Event14th IEEE International Conference on Data Mining, ICDM 2014 - Shenzhen, China
Duration: 2014 Dec 142014 Dec 17

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
NumberJanuary
Volume2015-January
ISSN (Print)1550-4786

Other

Other14th IEEE International Conference on Data Mining, ICDM 2014
CountryChina
CityShenzhen
Period14/12/1414/12/17

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Lim, M. H., Lou, J. G., Zhang, H., Fu, Q., Teoh, A. B. J., Lin, Q., ... Zhang, D. (2014). Identifying Recurrent and Unknown Performance Issues. In R. Kumar, H. Toivonen, J. Pei, J. Zhexue Huang, & X. Wu (Eds.), Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014 (January ed., pp. 320-329). [7023349] (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2015-January, No. January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2014.96
Lim, Meng Hui ; Lou, Jian Guang ; Zhang, Hongyu ; Fu, Qiang ; Teoh, Andrew Beng Jin ; Lin, Qingwei ; Ding, Rui ; Zhang, Dongmei. / Identifying Recurrent and Unknown Performance Issues. Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014. editor / Ravi Kumar ; Hannu Toivonen ; Jian Pei ; Joshua Zhexue Huang ; Xindong Wu. January. ed. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 320-329 (Proceedings - IEEE International Conference on Data Mining, ICDM; January).
@inproceedings{61898eb3b430405288c911738e744554,
title = "Identifying Recurrent and Unknown Performance Issues",
abstract = "For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.",
author = "Lim, {Meng Hui} and Lou, {Jian Guang} and Hongyu Zhang and Qiang Fu and Teoh, {Andrew Beng Jin} and Qingwei Lin and Rui Ding and Dongmei Zhang",
year = "2014",
month = "1",
day = "1",
doi = "10.1109/ICDM.2014.96",
language = "English",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "January",
pages = "320--329",
editor = "Ravi Kumar and Hannu Toivonen and Jian Pei and {Zhexue Huang}, Joshua and Xindong Wu",
booktitle = "Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014",
address = "United States",
edition = "January",

}

Lim, MH, Lou, JG, Zhang, H, Fu, Q, Teoh, ABJ, Lin, Q, Ding, R & Zhang, D 2014, Identifying Recurrent and Unknown Performance Issues. in R Kumar, H Toivonen, J Pei, J Zhexue Huang & X Wu (eds), Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014. January edn, 7023349, Proceedings - IEEE International Conference on Data Mining, ICDM, no. January, vol. 2015-January, Institute of Electrical and Electronics Engineers Inc., pp. 320-329, 14th IEEE International Conference on Data Mining, ICDM 2014, Shenzhen, China, 14/12/14. https://doi.org/10.1109/ICDM.2014.96

Identifying Recurrent and Unknown Performance Issues. / Lim, Meng Hui; Lou, Jian Guang; Zhang, Hongyu; Fu, Qiang; Teoh, Andrew Beng Jin; Lin, Qingwei; Ding, Rui; Zhang, Dongmei.

Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014. ed. / Ravi Kumar; Hannu Toivonen; Jian Pei; Joshua Zhexue Huang; Xindong Wu. January. ed. Institute of Electrical and Electronics Engineers Inc., 2014. p. 320-329 7023349 (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2015-January, No. January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Identifying Recurrent and Unknown Performance Issues

AU - Lim, Meng Hui

AU - Lou, Jian Guang

AU - Zhang, Hongyu

AU - Fu, Qiang

AU - Teoh, Andrew Beng Jin

AU - Lin, Qingwei

AU - Ding, Rui

AU - Zhang, Dongmei

PY - 2014/1/1

Y1 - 2014/1/1

N2 - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

AB - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

UR - http://www.scopus.com/inward/record.url?scp=84936948357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936948357&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2014.96

DO - 10.1109/ICDM.2014.96

M3 - Conference contribution

AN - SCOPUS:84936948357

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 320

EP - 329

BT - Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014

A2 - Kumar, Ravi

A2 - Toivonen, Hannu

A2 - Pei, Jian

A2 - Zhexue Huang, Joshua

A2 - Wu, Xindong

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lim MH, Lou JG, Zhang H, Fu Q, Teoh ABJ, Lin Q et al. Identifying Recurrent and Unknown Performance Issues. In Kumar R, Toivonen H, Pei J, Zhexue Huang J, Wu X, editors, Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014. January ed. Institute of Electrical and Electronics Engineers Inc. 2014. p. 320-329. 7023349. (Proceedings - IEEE International Conference on Data Mining, ICDM; January). https://doi.org/10.1109/ICDM.2014.96