We present dataflow mirroring, architectural support for low-overhead fine-grained systolic array allocation which overcomes the limitations of prior coarse-grained spatial-multitasking Neural Processing Unit (NPU) architectures. The key idea of dataflow mirroring is to reverse the dataflows of co-located Neural Networks (NNs) in horizontal and/or vertical directions, allowing allocation boundaries to be set between any adjacent rows and columns of a systolic array and supporting up to four-way spatial multitasking. Our detailed experiments using MLPerf NNs and a dataflow-mirroring-augmented NPU prototype which extends Google's TPU with dataflow mirroring shows that dataflow mirroring can significantly improve the multitasking performance by up to 46.4%.
|Title of host publication||2021 58th ACM/IEEE Design Automation Conference, DAC 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||6|
|Publication status||Published - 2021 Dec 5|
|Event||58th ACM/IEEE Design Automation Conference, DAC 2021 - San Francisco, United States|
Duration: 2021 Dec 5 → 2021 Dec 9
|Name||Proceedings - Design Automation Conference|
|Conference||58th ACM/IEEE Design Automation Conference, DAC 2021|
|Period||21/12/5 → 21/12/9|
Bibliographical noteFunding Information:
We proposed dataflow mirroring, lightweight architectural support for fine-grained systolic array allocation. By reversing the dataflows of co-located NNs, dataflow mirroring allows allocation boundaries to be set between any adjacent PE rows and columns. Then, we designed FGSpMt-NPU, a highly efficient spatial-multitasking NPU architecture which implements dataflow mirroring to achieve higher hardware utilization and performance over the existing coarse-grained spatial-multitasking NPU architecture. By enabling fine-grained distribution of the systolic array to co-located NNs, FGSpMt-NPU can greatly improve the multitasking performance over the state-of-the-art. ACKNOWLEDGEMENTS This work was supported by the National Research Foundation of Korea (NRF) grant (No. 2020R1F1A1069742) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No. 2020-0-01361, Artificial Intelligence Graduate School Program(Yonsei University)) funded by the Korea government (MSIT), and the Yonsei University Research Fund (2020-22-0511, 2021-22-0001). Youngsok Kim is the corresponding author of this paper.
© 2021 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modelling and Simulation