Multi-failure detection using device hierarchical attention network

Sangjun An, Mintae Kim, Wooju Kim

Research output: Contribution to journalArticlepeer-review


With rapid developments in the information industry, data centers have become increasingly important for collecting and storing data. The devices in data centers are not only connected to external machines to provide a variety of services, but they also store vast amounts of data, as device failures in data centers can result in fatal and heavy economic damage. Various methods have been studied in recent years to effectively predict failures in connected devices. However, in data center-scale systems, there is a problem of low frequency of failure when predicting the failure for each device. In addition, there are complex failures that may occur within the data center owing to a mix of devices and systems, and it is difficult to determine the cause of failure in such cases. In this study, we present a device hierarchical attention network (DHAN) methodology that can predict all device failures by simultaneously using existing device information regarding the devices in the data center. Because the devices in the data center could potentially affect each other, this information regarding the device is used in a composite manner. When using information from a single device, it was observed that failure could be predicted more effectively compared to the results obtained from failure prediction. In addition, by extracting attention information from the DHAN model, we identified a device that plays an important role in predicting the failure of a particular device. Thereafter, we utilized it to cluster and reconstruct the DHAN model and identify the results of predicting failures more effectively. Based on the results presented herein, it is expected that the proposed system can be stably maintained and repaired by identifying the potential impact of the devices.

Original languageEnglish
Article number117277
JournalExpert Systems with Applications
Publication statusPublished - 2022 Oct 1

Bibliographical note

Publisher Copyright:
© 2022 Elsevier Ltd

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Multi-failure detection using device hierarchical attention network'. Together they form a unique fingerprint.

Cite this