Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid design space exploration for graphics processing units (GPUs), prior studies have proposed GPU analytical models which capture frst-order stall events causing performance degradation; however, the existing analytical models cannot accurately model modern GPUs due to their outdated and highly abstract GPU core microarchitecture assumptions. Therefore, to accurately evaluate the performance of modern GPUs, we need a new GPU analytical model which accurately captures the stall events incurred by the signifcant changes in the core microarchitectures of modern GPUs. We propose GCoM, an accurate GPU analytical model which faithfully captures the key core-side stall events of modern GPUs. Through detailed microarchitecture-driven GPU core modeling, GCoM accurately models modern GPUs by revealing the following key core-side stalls overlooked by the existing GPU analytical models. First, GCoM identifes the compute structural stall events caused by the limited per-sub-core functional units. Second, GCoM exposes the memory structural stalls due to the limited banks and shared nature of per-core L1 data caches. Third, GCoM correctly predicts the memory data stalls induced by the sectored L1 data caches which split a cache line into a set of sectors sharing the same tag. Fourth, GCoM captures the idle stalls incurred by the inter-and intra-core load imbalances. Our experiments using an NVIDIA RTX 2060 confguration show that GCoM greatly improves the modeling accuracy by achieving a mean absolute error of 10.0% against Accel-Sim cycle-level simulator, whereas the state-of-the-art GPU analytical model achieves a mean absolute error of 44.9%.
|Title of host publication||ISCA 2022 - Proceedings of the 49th Annual International Symposium on Computer Architecture|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||13|
|Publication status||Published - 2022 Jun 18|
|Event||49th IEEE/ACM International Symposium on Computer Architecture, ISCA 2022 - New York, United States|
Duration: 2022 Jun 18 → 2022 Jun 22
|Name||Proceedings - International Symposium on Computer Architecture|
|Conference||49th IEEE/ACM International Symposium on Computer Architecture, ISCA 2022|
|Period||22/6/18 → 22/6/22|
Bibliographical noteFunding Information:
This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2020R1F1A1069742, 2022R1C1C1008131, 2022R1C1C1011307, 2021 R1F1A1062902), Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (2021-0-00853, 2020-0-01361, 2021-0-02051), and the Yonsei Signature Research Cluster Program (2022-22-0002). Jounghoo Lee, Yeonan Ha, and Suhyun Lee are with the Department of Computer Science at Yonsei University and are partly supported by the BK21 FOUR (Fostering Outstanding Universities for Research) funded by the Ministry of Education (MOE) of Korea and National Research Foundation (NRF) of Korea.
This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2020R1F1A1069742, 2022R1C1C1008131, 2022R1C1C1011307, 2021 R1F1A1062902), Institute of Information &communications Technology Planning &Evaluation (IITP) grant funded by the Korea government (MSIT) (2021-0-00853, 2020-0-01361, 2021-0-02051), and the Yonsei Signature Research Cluster Program (2022-22-0002).
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture