The performance analysis and study of large-scale many-core processor architectures require fast and highly accurate simulation techniques in order to reduce time consumption. State-of-the-art graphics processing units (GPUs), which are used extensively as coprocessors in the high-performance-computing area, also require fast simulation techniques because they have massively complex microarchitectures with thousands of processing elements. At present, however, GPU simulators do not have sufficient simulation speed for advanced software and architecture studies. In this study, we propose a new parallel simulation framework and a new parallel simulation technique for improving the simulation speed of GPUs. The proposed framework facilitates multithreaded simulation by exploiting the architectural-level parallelism and execution model parallelism of GPUs. In addition, an error predictive synchronization scheme based on a timing error prediction mechanism is used to minimize the cycle errors and simulator slowdown during parallel simulations. The experimental results obtained using a simulator with the proposed framework showed that the proposed technique provided a speedup of up to 8.9 times compared with an existing single-thread-based GPU simulator on a 16-core machine.
Bibliographical noteFunding Information:
This paper is an extension of our previous study, "Parallel GPU Architecture Simulation Framework Exploiting Work Allocation Unit Parallelism," which appeared in the 2013 IEEE International Symposium on Performance Analysis of Systems and Software. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) [NRF-2015R1A2A2A01008281].
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics