TY - GEN
T1 - Design and effectiveness of small-sized decoupled dispatch queues
AU - Ro, Won W.
AU - Gaudiot, Jean Luc
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues in modern superscalar microprocessors. However, such large queues are inevitably accompanied by high circuit complexity which correspondingly limits the pipeline clock rates. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which depends on globally broadcasting operations to wake up and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architecture model. Simulation results based on 14 data intensive benchmarks show that our DDQ (Decoupled Dispatch Queues) design achieves performance comparable to a superscalar machine with a large dispatch queue. We also show that our DDQ can be designed with small-sized, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.
AB - Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues in modern superscalar microprocessors. However, such large queues are inevitably accompanied by high circuit complexity which correspondingly limits the pipeline clock rates. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which depends on globally broadcasting operations to wake up and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architecture model. Simulation results based on 14 data intensive benchmarks show that our DDQ (Decoupled Dispatch Queues) design achieves performance comparable to a superscalar machine with a large dispatch queue. We also show that our DDQ can be designed with small-sized, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.
UR - http://www.scopus.com/inward/record.url?scp=33749998948&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749998948&partnerID=8YFLogxK
U2 - 10.1007/11823285_50
DO - 10.1007/11823285_50
M3 - Conference contribution
AN - SCOPUS:33749998948
SN - 3540377832
SN - 9783540377832
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 485
EP - 494
BT - Euro-Par 2006 Parallel Processing - 12th International Euro-Par Conference, Proceedings
PB - Springer Verlag
T2 - 12th International Euro-Par Conference 2006
Y2 - 28 August 2006 through 1 September 2006
ER -