Linebacker: Preserving victim cache lines in idle register files of GPUs

Yunho Oh, Gunjae Koo, Murali Annavaram, Won Woo Ro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0% of speedup compared to the previously proposed warp throttling techniques.

Original languageEnglish
Title of host publicationISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages183-196
Number of pages14
ISBN (Electronic)9781450366694
DOIs
Publication statusPublished - 2019 Jun 22
Event46th International Symposium on Computer Architecture, ISCA 2019 - Phoenix, United States
Duration: 2019 Jun 222019 Jun 26

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Conference

Conference46th International Symposium on Computer Architecture, ISCA 2019
CountryUnited States
CityPhoenix
Period19/6/2219/6/26

Fingerprint

Data storage equipment
Graphics processing unit

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Oh, Y., Koo, G., Annavaram, M., & Ro, W. W. (2019). Linebacker: Preserving victim cache lines in idle register files of GPUs. In ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture (pp. 183-196). (Proceedings - International Symposium on Computer Architecture). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3307650.3322222
Oh, Yunho ; Koo, Gunjae ; Annavaram, Murali ; Ro, Won Woo. / Linebacker : Preserving victim cache lines in idle register files of GPUs. ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 183-196 (Proceedings - International Symposium on Computer Architecture).
@inproceedings{f1ce80d5f77f4a53b2142ba835ca88ad,
title = "Linebacker: Preserving victim cache lines in idle register files of GPUs",
abstract = "Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0{\%} of speedup compared to the previously proposed warp throttling techniques.",
author = "Yunho Oh and Gunjae Koo and Murali Annavaram and Ro, {Won Woo}",
year = "2019",
month = "6",
day = "22",
doi = "10.1145/3307650.3322222",
language = "English",
series = "Proceedings - International Symposium on Computer Architecture",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "183--196",
booktitle = "ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture",
address = "United States",

}

Oh, Y, Koo, G, Annavaram, M & Ro, WW 2019, Linebacker: Preserving victim cache lines in idle register files of GPUs. in ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture. Proceedings - International Symposium on Computer Architecture, Institute of Electrical and Electronics Engineers Inc., pp. 183-196, 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, United States, 19/6/22. https://doi.org/10.1145/3307650.3322222

Linebacker : Preserving victim cache lines in idle register files of GPUs. / Oh, Yunho; Koo, Gunjae; Annavaram, Murali; Ro, Won Woo.

ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture. Institute of Electrical and Electronics Engineers Inc., 2019. p. 183-196 (Proceedings - International Symposium on Computer Architecture).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Linebacker

T2 - Preserving victim cache lines in idle register files of GPUs

AU - Oh, Yunho

AU - Koo, Gunjae

AU - Annavaram, Murali

AU - Ro, Won Woo

PY - 2019/6/22

Y1 - 2019/6/22

N2 - Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0% of speedup compared to the previously proposed warp throttling techniques.

AB - Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0% of speedup compared to the previously proposed warp throttling techniques.

UR - http://www.scopus.com/inward/record.url?scp=85069452030&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069452030&partnerID=8YFLogxK

U2 - 10.1145/3307650.3322222

DO - 10.1145/3307650.3322222

M3 - Conference contribution

AN - SCOPUS:85069452030

T3 - Proceedings - International Symposium on Computer Architecture

SP - 183

EP - 196

BT - ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Oh Y, Koo G, Annavaram M, Ro WW. Linebacker: Preserving victim cache lines in idle register files of GPUs. In ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture. Institute of Electrical and Electronics Engineers Inc. 2019. p. 183-196. (Proceedings - International Symposium on Computer Architecture). https://doi.org/10.1145/3307650.3322222