Abstract
We present dataflow mirroring, architectural support for low-overhead fine-grained systolic array allocation which overcomes the limitations of prior coarse-grained spatial-multitasking Neural Processing Unit (NPU) architectures. The key idea of dataflow mirroring is to reverse the dataflows of co-located Neural Networks (NNs) in horizontal and/or vertical directions, allowing allocation boundaries to be set between any adjacent rows and columns of a systolic array and supporting up to four-way spatial multitasking. Our detailed experiments using MLPerf NNs and a dataflow-mirroring-augmented NPU prototype which extends Google's TPU with dataflow mirroring shows that dataflow mirroring can significantly improve the multitasking performance by up to 46.4%.
Original language | English |
---|---|
Title of host publication | 2021 58th ACM/IEEE Design Automation Conference, DAC 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 247-252 |
Number of pages | 6 |
ISBN (Electronic) | 9781665432740 |
DOIs | |
Publication status | Published - 2021 Dec 5 |
Event | 58th ACM/IEEE Design Automation Conference, DAC 2021 - San Francisco, United States Duration: 2021 Dec 5 → 2021 Dec 9 |
Publication series
Name | Proceedings - Design Automation Conference |
---|---|
Volume | 2021-December |
ISSN (Print) | 0738-100X |
Conference
Conference | 58th ACM/IEEE Design Automation Conference, DAC 2021 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 21/12/5 → 21/12/9 |
Bibliographical note
Funding Information:We proposed dataflow mirroring, lightweight architectural support for fine-grained systolic array allocation. By reversing the dataflows of co-located NNs, dataflow mirroring allows allocation boundaries to be set between any adjacent PE rows and columns. Then, we designed FGSpMt-NPU, a highly efficient spatial-multitasking NPU architecture which implements dataflow mirroring to achieve higher hardware utilization and performance over the existing coarse-grained spatial-multitasking NPU architecture. By enabling fine-grained distribution of the systolic array to co-located NNs, FGSpMt-NPU can greatly improve the multitasking performance over the state-of-the-art. ACKNOWLEDGEMENTS This work was supported by the National Research Foundation of Korea (NRF) grant (No. 2020R1F1A1069742) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No. 2020-0-01361, Artificial Intelligence Graduate School Program(Yonsei University)) funded by the Korea government (MSIT), and the Yonsei University Research Fund (2020-22-0511, 2021-22-0001). Youngsok Kim is the corresponding author of this paper.
Publisher Copyright:
© 2021 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modelling and Simulation