A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP Paper)

Yongseung Yu, Seokwon Kang, Yongjun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern GPUs are the most successful accelerators as they provide outstanding performance gain by using CUDA or OpenCL programming models. For maximum performance, programmers typically try to maximize the number of thread blocks of target programs, and GPUs also generally attempt to allocate the maximum number of thread blocks to their GPU cores. However, many recent studies have pointed out that simply allocating the maximum number of thread blocks to GPU cores does not always guarantee the best performance, and identifying proper number of thread blocks per GPU core is a major challenge. Despite these studies, most existing architectural techniques cannot be directly applied to current GPU hardware, and the optimal number of thread blocks can vary significantly depending on the target GPU and application characteristics. To solve these problems, this study proposes a just-in-time thread block number adjustment system using CUDA binary modification upon an LLVM compiler framework, referred to as the CTA Limiter, in order to dynamically maximize GPU performance on real GPUs without reprogramming. The framework gradually reduces the number of concurrent thread blocks of target CUDA workloads using extra shared memory allocation, and compares the execution time with the previous version to automatically identify the optimal number of co-running thread blocks per GPU Core. The results showed meaningful performance improvements, averaging at 30%, 40%, and 44%, in GTX 960, GTX 1050, and GTX 1080 Ti, respectively.

Original languageEnglish
Title of host publicationLCTES 2019 - Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, co-located with PLDI 2019
EditorsJian-Jia Chen, Aviral Shrivastava
PublisherAssociation for Computing Machinery
Pages193-197
Number of pages5
ISBN (Electronic)9781450367240
DOIs
Publication statusPublished - 2019 Jun 23
Event20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2019, co-located with PLDI 2019 - Phoenix, United States
Duration: 2019 Jun 23 → …

Publication series

NameProceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)

Conference

Conference20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2019, co-located with PLDI 2019
Country/TerritoryUnited States
CityPhoenix
Period19/6/23 → …

Bibliographical note

Funding Information:
This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP)(No. 2015R1C1A1A01053844) and ICT R&D program of MSIP/IITP (No.2017-0-00142). Yongjun Park is the corresponding author.

Publisher Copyright:
© 2019 Association for Computing Machinery.

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP Paper)'. Together they form a unique fingerprint.

Cite this