Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with accuracy, is one of the most common solutions for further performance improvement. Precision scaling of originally high-precision values into lower-precision values has recently been the most widely used GPUside approximation technique, including hardware-level halfprecision support. Although several approaches to find optimalmixed- precision configuration of GPU-side kernels have been introduced, total program performance gain is often low because total execution time is the combination of data transfer, type conversion, and kernel execution. As a result, kernel-level scaling may incur high type-conversion overhead of the kernel input/output data. To address this problem, this paper proposes an automatic precision scaling framework called PreScaler thatmaximizes the programperformance at thememory object level by considering whole OpenCL program flows. The main difficulty is that the best configuration cannot be easily predicted due to various application- and system-specific characteristics. PreScaler solves this problem using search space minimization and decision-tree-based search processes. First, it minimizes the number of test configurations based on the information from system inspection and dynamic profiling. Then, it finds the best memory-object level mixed-precision configuration using a decision-tree-based search. PreScaler achieves an average performance gain of 1.33x over the baseline while maintaining the target output quality level.
|Title of host publication||CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization|
|Editors||Jason Mars, Lingjia Tang, Jingling Xue, Peng Wu|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||13|
|Publication status||Published - 2020 Feb 22|
|Event||18th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2020 - San Diego, United States|
Duration: 2020 Feb 22 → 2020 Feb 26
|Name||CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization|
|Conference||18th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2020|
|Period||20/2/22 → 20/2/26|
Bibliographical noteFunding Information:
All our scripts are customizable. You can change the input data or TOQ by modifying Benchmark/Polybench-1.0/run_all.sh. You can also observe the performance improvements on different systems by changing the system configuration. In order to apply our technique to other benchmarks, our framework requires several application information such as build scripts, execution scripts, data transfer time, kernel execution time, and accuracy for each execution trials. When system inspection is finished, you can apply PreScaler to other OpenCL applications by running PreScaler/bin/-precision_scaler/framework. Acknowledgments This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1901-03. Yongjun Park is the corresponding author.
© 2020 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes
- Applied Mathematics
- Computer Science Applications
- Control and Optimization
- Computational Theory and Mathematics