Deep neural networks (DNNs) are deployed on hardware devices and are widely used in various fields to perform inference from inputs. Unfortunately, hardware devices can become unreliable by incidents such as unintended process, voltage and temperature variations, and this can introduce the occurrence of erroneous weights. Prior study reports that the erroneous weights can cause a significant accuracy degradation. In safety-critical applications such as autonomous driving, it can bring catastrophic results. Retraining or fine-tuning can be used to adjust corrupted weights to prevent the accuracy degradation. However, training-based approaches would incur a significant computational overhead due to a massive size of training datasets and intensive training operations. Thus, this paper proposes a value-aware parity insertion error correction code (ECC) to recover erroneous weights with a reduced parity storage overhead and no additional training processes. Previous ECC-based reliability improvement methods, Weight Nulling and In-place Zero-space ECC, are compared with the proposed method. Experimental results demonstrate that DNNs with the value-aware parity insertion ECC can perform inference without the accuracy degradation, on average, in 122.5× and 15.1× higher bit error rate conditions over Weight Nulling and In-place Zero-space ECC, respectively.
|Title of host publication||Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022|
|Editors||Cristiana Bolchini, Ingrid Verbauwhede, Ioana Vatajelu|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||6|
|Publication status||Published - 2022|
|Event||2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 - Virtual, Online, Belgium|
Duration: 2022 Mar 14 → 2022 Mar 23
|Name||Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022|
|Conference||2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022|
|Period||22/3/14 → 22/3/23|
Bibliographical noteFunding Information:
This work was supported by Samsung Electronics Co., Ltd. under project number 10201208-07834-01.
© 2022 EDAA.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Networks and Communications
- Hardware and Architecture
- Safety, Risk, Reliability and Quality
- Control and Optimization