CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs

Jie Zhang, Shuwen Gao, Nam Sung Kim, Myoungsoo Jung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

A modern GPU aims to simultaneously execute more warps for higher Thread-Level Parallelism (TLP) and performance. When generating many memory requests, however, warps contend for limited cache space and thrash cache, which in turn severely degrades performance. To reduce such cache thrashing, we may adopt cache locality-Aware warp scheduling which gives higher execution priority to warps with higher potential of data locality. However, we observe that warps with high potential of data locality often incurs far more cache thrashing or interference than warps with low potential of data locality. Consequently, cache locality-Aware warp scheduling may undesirably increase cache interference and/or unnecessarily decrease TLP. In this paper, we propose Cache Interference-Aware throughput-Oriented (CIAO) on-chip memory architecture and warp scheduling which exploit unused shared memory space and take insight opposite to cache locality-Aware warp scheduling. Specifically, CIAO on-chip memory architecture can adaptively redirect memory requests of severely interfering warps to unused shared memory space to isolate memory requests of these interfering warps from those of interfered warps. If these interfering warps still incur severe cache interference, CIAO warp scheduling then begins to selectively throttle execution of these interfering warps. Our experiment shows that CIAO can offer 54% higher performance than prior cache locality-Aware scheduling at a small chip cost.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages149-159
Number of pages11
ISBN (Print)9781538643686
DOIs
Publication statusPublished - 2018 Aug 3
Event32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 - Vancouver, Canada
Duration: 2018 May 212018 May 25

Publication series

NameProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

Conference

Conference32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018
CountryCanada
CityVancouver
Period18/5/2118/5/25

Fingerprint

Scheduling
Throughput
Data storage equipment
Memory architecture
Graphics processing unit
Interference
Costs
Locality
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Cite this

Zhang, J., Gao, S., Kim, N. S., & Jung, M. (2018). CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs. In Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018 (pp. 149-159). [8425169] (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2018.00025
Zhang, Jie ; Gao, Shuwen ; Kim, Nam Sung ; Jung, Myoungsoo. / CIAO : Cache interference-aware throughput-oriented architecture and scheduling for GPUs. Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 149-159 (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018).
@inproceedings{79a45af2fac9487f87e7e577ef830fd2,
title = "CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs",
abstract = "A modern GPU aims to simultaneously execute more warps for higher Thread-Level Parallelism (TLP) and performance. When generating many memory requests, however, warps contend for limited cache space and thrash cache, which in turn severely degrades performance. To reduce such cache thrashing, we may adopt cache locality-Aware warp scheduling which gives higher execution priority to warps with higher potential of data locality. However, we observe that warps with high potential of data locality often incurs far more cache thrashing or interference than warps with low potential of data locality. Consequently, cache locality-Aware warp scheduling may undesirably increase cache interference and/or unnecessarily decrease TLP. In this paper, we propose Cache Interference-Aware throughput-Oriented (CIAO) on-chip memory architecture and warp scheduling which exploit unused shared memory space and take insight opposite to cache locality-Aware warp scheduling. Specifically, CIAO on-chip memory architecture can adaptively redirect memory requests of severely interfering warps to unused shared memory space to isolate memory requests of these interfering warps from those of interfered warps. If these interfering warps still incur severe cache interference, CIAO warp scheduling then begins to selectively throttle execution of these interfering warps. Our experiment shows that CIAO can offer 54{\%} higher performance than prior cache locality-Aware scheduling at a small chip cost.",
author = "Jie Zhang and Shuwen Gao and Kim, {Nam Sung} and Myoungsoo Jung",
year = "2018",
month = "8",
day = "3",
doi = "10.1109/IPDPS.2018.00025",
language = "English",
isbn = "9781538643686",
series = "Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "149--159",
booktitle = "Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018",
address = "United States",

}

Zhang, J, Gao, S, Kim, NS & Jung, M 2018, CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs. in Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018., 8425169, Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018, Institute of Electrical and Electronics Engineers Inc., pp. 149-159, 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, Canada, 18/5/21. https://doi.org/10.1109/IPDPS.2018.00025

CIAO : Cache interference-aware throughput-oriented architecture and scheduling for GPUs. / Zhang, Jie; Gao, Shuwen; Kim, Nam Sung; Jung, Myoungsoo.

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 149-159 8425169 (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - CIAO

T2 - Cache interference-aware throughput-oriented architecture and scheduling for GPUs

AU - Zhang, Jie

AU - Gao, Shuwen

AU - Kim, Nam Sung

AU - Jung, Myoungsoo

PY - 2018/8/3

Y1 - 2018/8/3

N2 - A modern GPU aims to simultaneously execute more warps for higher Thread-Level Parallelism (TLP) and performance. When generating many memory requests, however, warps contend for limited cache space and thrash cache, which in turn severely degrades performance. To reduce such cache thrashing, we may adopt cache locality-Aware warp scheduling which gives higher execution priority to warps with higher potential of data locality. However, we observe that warps with high potential of data locality often incurs far more cache thrashing or interference than warps with low potential of data locality. Consequently, cache locality-Aware warp scheduling may undesirably increase cache interference and/or unnecessarily decrease TLP. In this paper, we propose Cache Interference-Aware throughput-Oriented (CIAO) on-chip memory architecture and warp scheduling which exploit unused shared memory space and take insight opposite to cache locality-Aware warp scheduling. Specifically, CIAO on-chip memory architecture can adaptively redirect memory requests of severely interfering warps to unused shared memory space to isolate memory requests of these interfering warps from those of interfered warps. If these interfering warps still incur severe cache interference, CIAO warp scheduling then begins to selectively throttle execution of these interfering warps. Our experiment shows that CIAO can offer 54% higher performance than prior cache locality-Aware scheduling at a small chip cost.

AB - A modern GPU aims to simultaneously execute more warps for higher Thread-Level Parallelism (TLP) and performance. When generating many memory requests, however, warps contend for limited cache space and thrash cache, which in turn severely degrades performance. To reduce such cache thrashing, we may adopt cache locality-Aware warp scheduling which gives higher execution priority to warps with higher potential of data locality. However, we observe that warps with high potential of data locality often incurs far more cache thrashing or interference than warps with low potential of data locality. Consequently, cache locality-Aware warp scheduling may undesirably increase cache interference and/or unnecessarily decrease TLP. In this paper, we propose Cache Interference-Aware throughput-Oriented (CIAO) on-chip memory architecture and warp scheduling which exploit unused shared memory space and take insight opposite to cache locality-Aware warp scheduling. Specifically, CIAO on-chip memory architecture can adaptively redirect memory requests of severely interfering warps to unused shared memory space to isolate memory requests of these interfering warps from those of interfered warps. If these interfering warps still incur severe cache interference, CIAO warp scheduling then begins to selectively throttle execution of these interfering warps. Our experiment shows that CIAO can offer 54% higher performance than prior cache locality-Aware scheduling at a small chip cost.

UR - http://www.scopus.com/inward/record.url?scp=85052219623&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052219623&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2018.00025

DO - 10.1109/IPDPS.2018.00025

M3 - Conference contribution

AN - SCOPUS:85052219623

SN - 9781538643686

T3 - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

SP - 149

EP - 159

BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Zhang J, Gao S, Kim NS, Jung M. CIAO: Cache interference-aware throughput-oriented architecture and scheduling for GPUs. In Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 149-159. 8425169. (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018). https://doi.org/10.1109/IPDPS.2018.00025