Reconstructing Out-of-Order Issue Queue

Ipoom Jeong, Jiwon Lee, Myung Kuk Yoon, Won Woo Ro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Out-of-order cores provide high performance at the cost of energy efficiency. Dynamic scheduling is one of the major contributors to this: generating highly optimized issue schedules considering both data dependences and underlying execution resources, but relying heavily on complex wakeup and select operations of an out-of-order issue queue (IQ). For decades, researchers have proposed several complexity-effective dynamic scheduling schemes by leveraging the energy efficiency of an in-order IQ. However, they are either costly or not capable of delivering sufficient performance to substitute for a conventional wide-issue out-of-order IQ. In this work, we revisit two previous designs: one classical dependence-based design and the other state-of-the-art readiness-based design. We observe that they are complementary to each other, and thus their synergistic integration has the potential to be a good alternative to an out-of-order IQ. We first combine these two designs, and further analyze the main architectural bottlenecks that incur the underutilization of aggregate issue capability, thereby limiting the exploitation of instruction-level and memory-level parallelisms: 1) memory dependences not exposed by the register-based dependence analysis and 2) wide and shallow nature of dynamic dependence chains due to the long-latency memory accesses. To this end, we propose Ballerino, a novel microarchitecture that performs balanced and cache-miss-tolerable dynamic scheduling via a complementary combination of cascaded and clustered in-order IQs. Ballerino is built upon three key functionalities: 1) speculatively filtering out ready-at-dispatch instructions, 2) eliminating wasteful wakeup operations via a simple steering technique leveraging the awareness of memory dependences, and 3) reacting to program phase changes by allowing different load-dependent chains to share a single IQ while guaranteeing their out-of-order issue. The net effect is minimal scheduling energy consumption per instruction while providing comparable scheduling performance to a fully out-of-order IQ. In our analysis, Ballerino achieves comparable performance to an 8-wide out-of-order core by using twelve in-order IQs, improving core-wide energy efficiency by 20%.

Original languageEnglish
Title of host publicationProceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
PublisherIEEE Computer Society
Pages144-161
Number of pages18
ISBN (Electronic)9781665462723
DOIs
Publication statusPublished - 2022
Event55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 - Chicago, United States
Duration: 2022 Oct 12022 Oct 5

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2022-October
ISSN (Print)1072-4451

Conference

Conference55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
Country/TerritoryUnited States
CityChicago
Period22/10/122/10/5

Bibliographical note

Funding Information:
The authors would like to thank the anonymous reviewers for their valuable comments that helped to improve the quality of the paper. This work was supported in part by Institute of Information communications TechnologyPlanning Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00853, Developing Software Platform for Programming of PIM), in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1G1A1092196), and in part by Samsung Electronics Company, Ltd., Hwaseong, Korea. W. W. Ro is the corresponding author.

Publisher Copyright:
© 2022 IEEE.

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Reconstructing Out-of-Order Issue Queue'. Together they form a unique fingerprint.

Cite this