This paper conducts a detailed study of the factors affecting the operation stalls in terms of the fetch group size on the warp scheduler of GPUs. Throughout this paper, we reveal that the size of a fetch group is highly involved for hiding various types of operation stalls: Short latency stalls, long latency stalls, and Load/Store Unit (LSU) stalls. The scheduler with a small fetch group cannot hide short latency stalls due to the limited number of warps in a fetch group. In contrast, the scheduler with a large fetch group cannot hide long latency and LSU stalls due to the limited number of fetch groups and the lack of memory subsystems, respectively. To hide various types of stalls, this paper proposes a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of a fetch group dynamically based on the execution phases of applications. For the applications that have the best performance at LRR (one fetch group), the DRAW scheduler matches the performance of LRR and outperforms TL (multiple fetch groups) by 22.7 percent. In addition, for the applications that have the best performance at TL, our scheduler achieves 11.0 and 5.5 percent better performance compared to LRR and TL, respectively.
|Number of pages||15|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|Publication status||Published - 2017 Nov 1|
Bibliographical noteFunding Information:
This work is supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2015R1A2A2A01008281). This paper is an extension of our previous study, “DRAW: Investigating benefits of adaptive fetch group size on GPU,” which appeared in the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2015). W. W. Ro is the corresponding author.
© 2017 IEEE.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics