An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

Jung Wook Park, Hoon Mo Yang, Gi Ho Park, Shin-Dug Kim, Charles C. Weems

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, cache misses and dynamic branches can cause additional latencies and complicated management in these parallel architectures. To address this problem, we combine a systolic execution scheme with on-demand warp activation that handles cache miss latency and branch divergence efficiently without significantly increasing hardware resources, either in terms of logic or register space. Simulation indicates that the proposed architecture offers 25% better performance than a traditional SIMD architecture with the same resources, and requires significantly fewer resources to match the performance of a typical modern vector multi-threaded GPU architecture.

Original languageEnglish
Pages (from-to)1110-1118
Number of pages9
JournalJournal of Parallel and Distributed Computing
Volume70
Issue number11
DOIs
Publication statusPublished - 2010 Nov 1

Fingerprint

3D Graphics
Systolic arrays
Parallel architectures
Processing
Resources
Chemical activation
Cache
Latency
Hardware
Branch
Systolic Array
Graphics Processors
Performance Guarantee
Parallel Architectures
Graphics Processing Unit
Activation
Divergence
Architecture
Graphics processing unit
Logic

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Software
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Park, Jung Wook ; Yang, Hoon Mo ; Park, Gi Ho ; Kim, Shin-Dug ; Weems, Charles C. / An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing. In: Journal of Parallel and Distributed Computing. 2010 ; Vol. 70, No. 11. pp. 1110-1118.
@article{17dd0c14fd1d428aa4c46341478f9cd4,
title = "An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing",
abstract = "In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, cache misses and dynamic branches can cause additional latencies and complicated management in these parallel architectures. To address this problem, we combine a systolic execution scheme with on-demand warp activation that handles cache miss latency and branch divergence efficiently without significantly increasing hardware resources, either in terms of logic or register space. Simulation indicates that the proposed architecture offers 25{\%} better performance than a traditional SIMD architecture with the same resources, and requires significantly fewer resources to match the performance of a typical modern vector multi-threaded GPU architecture.",
author = "Park, {Jung Wook} and Yang, {Hoon Mo} and Park, {Gi Ho} and Shin-Dug Kim and Weems, {Charles C.}",
year = "2010",
month = "11",
day = "1",
doi = "10.1016/j.jpdc.2010.07.002",
language = "English",
volume = "70",
pages = "1110--1118",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "11",

}

An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing. / Park, Jung Wook; Yang, Hoon Mo; Park, Gi Ho; Kim, Shin-Dug; Weems, Charles C.

In: Journal of Parallel and Distributed Computing, Vol. 70, No. 11, 01.11.2010, p. 1110-1118.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

AU - Park, Jung Wook

AU - Yang, Hoon Mo

AU - Park, Gi Ho

AU - Kim, Shin-Dug

AU - Weems, Charles C.

PY - 2010/11/1

Y1 - 2010/11/1

N2 - In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, cache misses and dynamic branches can cause additional latencies and complicated management in these parallel architectures. To address this problem, we combine a systolic execution scheme with on-demand warp activation that handles cache miss latency and branch divergence efficiently without significantly increasing hardware resources, either in terms of logic or register space. Simulation indicates that the proposed architecture offers 25% better performance than a traditional SIMD architecture with the same resources, and requires significantly fewer resources to match the performance of a typical modern vector multi-threaded GPU architecture.

AB - In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, cache misses and dynamic branches can cause additional latencies and complicated management in these parallel architectures. To address this problem, we combine a systolic execution scheme with on-demand warp activation that handles cache miss latency and branch divergence efficiently without significantly increasing hardware resources, either in terms of logic or register space. Simulation indicates that the proposed architecture offers 25% better performance than a traditional SIMD architecture with the same resources, and requires significantly fewer resources to match the performance of a typical modern vector multi-threaded GPU architecture.

UR - http://www.scopus.com/inward/record.url?scp=77956231121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956231121&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2010.07.002

DO - 10.1016/j.jpdc.2010.07.002

M3 - Article

VL - 70

SP - 1110

EP - 1118

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 11

ER -