PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems

Seokwon Kang, Kyunghwan Choi, Yongjun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with accuracy, is one of the most common solutions for further performance improvement. Precision scaling of originally high-precision values into lower-precision values has recently been the most widely used GPUside approximation technique, including hardware-level halfprecision support. Although several approaches to find optimalmixed- precision configuration of GPU-side kernels have been introduced, total program performance gain is often low because total execution time is the combination of data transfer, type conversion, and kernel execution. As a result, kernel-level scaling may incur high type-conversion overhead of the kernel input/output data. To address this problem, this paper proposes an automatic precision scaling framework called PreScaler thatmaximizes the programperformance at thememory object level by considering whole OpenCL program flows. The main difficulty is that the best configuration cannot be easily predicted due to various application- and system-specific characteristics. PreScaler solves this problem using search space minimization and decision-tree-based search processes. First, it minimizes the number of test configurations based on the information from system inspection and dynamic profiling. Then, it finds the best memory-object level mixed-precision configuration using a decision-tree-based search. PreScaler achieves an average performance gain of 1.33x over the baseline while maintaining the target output quality level.

Original languageEnglish
Title of host publicationCGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
EditorsJason Mars, Lingjia Tang, Jingling Xue, Peng Wu
PublisherAssociation for Computing Machinery, Inc
Pages280-292
Number of pages13
ISBN (Electronic)9781450370479
DOIs
Publication statusPublished - 2020 Feb 22
Event18th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2020 - San Diego, United States
Duration: 2020 Feb 222020 Feb 26

Publication series

NameCGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Conference

Conference18th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2020
Country/TerritoryUnited States
CitySan Diego
Period20/2/2220/2/26

Bibliographical note

Funding Information:
All our scripts are customizable. You can change the input data or TOQ by modifying Benchmark/Polybench-1.0/run_all.sh. You can also observe the performance improvements on different systems by changing the system configuration. In order to apply our technique to other benchmarks, our framework requires several application information such as build scripts, execution scripts, data transfer time, kernel execution time, and accuracy for each execution trials. When system inspection is finished, you can apply PreScaler to other OpenCL applications by running PreScaler/bin/-precision_scaler/framework. Acknowledgments This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1901-03. Yongjun Park is the corresponding author.

Publisher Copyright:
© 2020 Association for Computing Machinery.

All Science Journal Classification (ASJC) codes

  • Applied Mathematics
  • Computer Science Applications
  • Control and Optimization
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems'. Together they form a unique fingerprint.

Cite this