Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimately hinders attempts at increasing the clock speed. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which itself depends on globally broadcasting operations to wakeup and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architectures. Simulation results based on 14 data intensive benchmarks show that while our DDQ (Decoupled Dispatch Queues) design achieves levels of performance which are comparable to what would be obtained in a superscalar machine with a large dispatch queue, our approach can be designed with small, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.
Bibliographical noteFunding Information:
This paper is based upon work supported in part from Yonsei University (Grant No. 2007-1-0123) and in part by NSF grant CCF-0541403. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence