Managing performance-reliability tradeoffs in multicore processors

William Jinho Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

There is a fundamental tradeoff between processor performance and lifetime reliability. High throughput operations increase power and heat dissipations that have adverse impacts on lifetime reliability. On the contrary, lifetime reliability favors low utilization to reduce stresses and avoid failures. A key challenge of understanding this tradeoff is in connecting application characteristics to device-level degradation behaviors. Using a full-system microarchitecture and physics simulation, the performance-reliability tradeoff in a multicore processor is analyzed by introducing a metric, throughput-lifetime product (TLP). A finding reveals that reducing the variance of degradation distribution on the multicore die leads to effectively enhancing processor lifetime with minimal impact on performance. This concept is referred to as dynamic reliability variance management (DRVM). We discuss three possible microarchitectural techniques that perform DRVM and improve the TLP; i) phase-aware thread migration, ii) dynamic voltage scaling, and iii) turbo-mode execution combined with DRVM. The simulation results with selected PARSEC and SPLASH-2 benchmarks show that DRVM techniques improve processor lifetime up to 15% or enhance the throughput-lifetime tradeoff by 12% without adding extra design margins or spare components on the multicore die.

Original languageEnglish
Title of host publication2015 IEEE International Reliability Physics Symposium, IRPS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3C11-3C17
ISBN (Electronic)9781467373623
DOIs
Publication statusPublished - 2015 May 26
EventIEEE International Reliability Physics Symposium, IRPS 2015 - Monterey, United States
Duration: 2015 Apr 192015 Apr 23

Publication series

NameIEEE International Reliability Physics Symposium Proceedings
Volume2015-May
ISSN (Print)1541-7026

Other

OtherIEEE International Reliability Physics Symposium, IRPS 2015
CountryUnited States
CityMonterey
Period15/4/1915/4/23

Fingerprint

Throughput
Degradation
Heat losses
Energy dissipation
Physics

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Song, W. J., Mukhopadhyay, S., & Yalamanchili, S. (2015). Managing performance-reliability tradeoffs in multicore processors. In 2015 IEEE International Reliability Physics Symposium, IRPS 2015 (pp. 3C11-3C17). [7112707] (IEEE International Reliability Physics Symposium Proceedings; Vol. 2015-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRPS.2015.7112707
Song, William Jinho ; Mukhopadhyay, Saibal ; Yalamanchili, Sudhakar. / Managing performance-reliability tradeoffs in multicore processors. 2015 IEEE International Reliability Physics Symposium, IRPS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 3C11-3C17 (IEEE International Reliability Physics Symposium Proceedings).
@inproceedings{3b9ae22045d04774979dbd56536a8438,
title = "Managing performance-reliability tradeoffs in multicore processors",
abstract = "There is a fundamental tradeoff between processor performance and lifetime reliability. High throughput operations increase power and heat dissipations that have adverse impacts on lifetime reliability. On the contrary, lifetime reliability favors low utilization to reduce stresses and avoid failures. A key challenge of understanding this tradeoff is in connecting application characteristics to device-level degradation behaviors. Using a full-system microarchitecture and physics simulation, the performance-reliability tradeoff in a multicore processor is analyzed by introducing a metric, throughput-lifetime product (TLP). A finding reveals that reducing the variance of degradation distribution on the multicore die leads to effectively enhancing processor lifetime with minimal impact on performance. This concept is referred to as dynamic reliability variance management (DRVM). We discuss three possible microarchitectural techniques that perform DRVM and improve the TLP; i) phase-aware thread migration, ii) dynamic voltage scaling, and iii) turbo-mode execution combined with DRVM. The simulation results with selected PARSEC and SPLASH-2 benchmarks show that DRVM techniques improve processor lifetime up to 15{\%} or enhance the throughput-lifetime tradeoff by 12{\%} without adding extra design margins or spare components on the multicore die.",
author = "Song, {William Jinho} and Saibal Mukhopadhyay and Sudhakar Yalamanchili",
year = "2015",
month = "5",
day = "26",
doi = "10.1109/IRPS.2015.7112707",
language = "English",
series = "IEEE International Reliability Physics Symposium Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "3C11--3C17",
booktitle = "2015 IEEE International Reliability Physics Symposium, IRPS 2015",
address = "United States",

}

Song, WJ, Mukhopadhyay, S & Yalamanchili, S 2015, Managing performance-reliability tradeoffs in multicore processors. in 2015 IEEE International Reliability Physics Symposium, IRPS 2015., 7112707, IEEE International Reliability Physics Symposium Proceedings, vol. 2015-May, Institute of Electrical and Electronics Engineers Inc., pp. 3C11-3C17, IEEE International Reliability Physics Symposium, IRPS 2015, Monterey, United States, 15/4/19. https://doi.org/10.1109/IRPS.2015.7112707

Managing performance-reliability tradeoffs in multicore processors. / Song, William Jinho; Mukhopadhyay, Saibal; Yalamanchili, Sudhakar.

2015 IEEE International Reliability Physics Symposium, IRPS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 3C11-3C17 7112707 (IEEE International Reliability Physics Symposium Proceedings; Vol. 2015-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Managing performance-reliability tradeoffs in multicore processors

AU - Song, William Jinho

AU - Mukhopadhyay, Saibal

AU - Yalamanchili, Sudhakar

PY - 2015/5/26

Y1 - 2015/5/26

N2 - There is a fundamental tradeoff between processor performance and lifetime reliability. High throughput operations increase power and heat dissipations that have adverse impacts on lifetime reliability. On the contrary, lifetime reliability favors low utilization to reduce stresses and avoid failures. A key challenge of understanding this tradeoff is in connecting application characteristics to device-level degradation behaviors. Using a full-system microarchitecture and physics simulation, the performance-reliability tradeoff in a multicore processor is analyzed by introducing a metric, throughput-lifetime product (TLP). A finding reveals that reducing the variance of degradation distribution on the multicore die leads to effectively enhancing processor lifetime with minimal impact on performance. This concept is referred to as dynamic reliability variance management (DRVM). We discuss three possible microarchitectural techniques that perform DRVM and improve the TLP; i) phase-aware thread migration, ii) dynamic voltage scaling, and iii) turbo-mode execution combined with DRVM. The simulation results with selected PARSEC and SPLASH-2 benchmarks show that DRVM techniques improve processor lifetime up to 15% or enhance the throughput-lifetime tradeoff by 12% without adding extra design margins or spare components on the multicore die.

AB - There is a fundamental tradeoff between processor performance and lifetime reliability. High throughput operations increase power and heat dissipations that have adverse impacts on lifetime reliability. On the contrary, lifetime reliability favors low utilization to reduce stresses and avoid failures. A key challenge of understanding this tradeoff is in connecting application characteristics to device-level degradation behaviors. Using a full-system microarchitecture and physics simulation, the performance-reliability tradeoff in a multicore processor is analyzed by introducing a metric, throughput-lifetime product (TLP). A finding reveals that reducing the variance of degradation distribution on the multicore die leads to effectively enhancing processor lifetime with minimal impact on performance. This concept is referred to as dynamic reliability variance management (DRVM). We discuss three possible microarchitectural techniques that perform DRVM and improve the TLP; i) phase-aware thread migration, ii) dynamic voltage scaling, and iii) turbo-mode execution combined with DRVM. The simulation results with selected PARSEC and SPLASH-2 benchmarks show that DRVM techniques improve processor lifetime up to 15% or enhance the throughput-lifetime tradeoff by 12% without adding extra design margins or spare components on the multicore die.

UR - http://www.scopus.com/inward/record.url?scp=84942907852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942907852&partnerID=8YFLogxK

U2 - 10.1109/IRPS.2015.7112707

DO - 10.1109/IRPS.2015.7112707

M3 - Conference contribution

T3 - IEEE International Reliability Physics Symposium Proceedings

SP - 3C11-3C17

BT - 2015 IEEE International Reliability Physics Symposium, IRPS 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Song WJ, Mukhopadhyay S, Yalamanchili S. Managing performance-reliability tradeoffs in multicore processors. In 2015 IEEE International Reliability Physics Symposium, IRPS 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 3C11-3C17. 7112707. (IEEE International Reliability Physics Symposium Proceedings). https://doi.org/10.1109/IRPS.2015.7112707