GPUpd: A fast and scalable multi-GPU architecture using cooperative projection and distribution

Youngsok Kim, Jae Eon Jo, Hanhwi Jang, Minsoo Rhu, Hanjun Kim, Jangwoo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Graphics Processing Unit (GPU) vendors have been scaling single-GPU architectures to satisfy the ever-increasing user demands for faster graphics processing. However, as it gets extremely difficult to further scale single-GPU architectures, the vendors are aiming to achieve the scaled performance by simultaneously using multiple GPUs connected with newly developed, fast inter-GPU networks (e.g., NVIDIA NVLink, AMD XDMA). With fast inter-GPU networks, it is now promising to employ split frame rendering (SFR) which improves both frame rate and single-frame latency by assigning disjoint regions of a frame to different GPUs. Unfortunately, the scalability of current SFR implementations is seriously limited as they suffer from a large amount of redundant computation among GPUs. This paper proposes GPUpd, a novel multi-GPU architecture for fast and scalable SFR. With small hardware extensions, GPUpd introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and efficiently redistribute the objects to their corresponding GPUs. C-PD not only eliminates the redundant computation among GPUs, but also incurs minimal inter-GPU network trafic by transferring object IDs instead of mid-pipeline outcomes between GPUs. To further reduce the redistribution overheads, GPUpd minimizes inter-GPU synchronizations by implementing batching and runahead-execution of draw commands. Our detailed cycle-level simulations with 8 real-world game traces show that GPUpd achieves a geomean speedup of 4.98× in single-frame latency with 16 GPUs, whereas the current SFR implementations achieve only 3.07× geomean speedup which saturates on 4 or more GPUs.

Original languageEnglish
Title of host publicationMICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PublisherIEEE Computer Society
Pages574-586
Number of pages13
ISBN (Electronic)9781450349529
DOIs
Publication statusPublished - 2017 Oct 14
Event50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 - Cambridge, United States
Duration: 2017 Oct 142017 Oct 18

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
VolumePart F131207
ISSN (Print)1072-4451

Other

Other50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
CountryUnited States
CityCambridge
Period17/10/1417/10/18

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'GPUpd: A fast and scalable multi-GPU architecture using cooperative projection and distribution'. Together they form a unique fingerprint.

  • Cite this

    Kim, Y., Jo, J. E., Jang, H., Rhu, M., Kim, H., & Kim, J. (2017). GPUpd: A fast and scalable multi-GPU architecture using cooperative projection and distribution. In MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings (pp. 574-586). (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. Part F131207). IEEE Computer Society. https://doi.org/10.1145/3123939.3123968