Abstract
Systolic arrays are re-gaining the attention as the heart to accelerate machine learning workloads. This paper shows that a large design space exists at the logic level despite the simple structure of systolic arrays and proposes a novel systolic array based on factoring and radix-8 multipliers. The factored systolic array (FSA) extracts out the booth encoding and the hard-multiple generation which is common across all processing elements, reducing the delay and the area of the whole systolic array. This factoring is done at the cost of an increased number of registers, however, the reduced pipeline register requirement in radix-8 offsets this effect. The proposed factored 16-bit multiplier achieves up to 15%, 13%, and 23% better delay, area, and power, respectively, compared with the radix-4 multipliers even if the register overhead is included. The proposed FSA architecture improves delay, area, and power up to 11%, 20% and 31%, respectively, for different bitwidths when compared with the conventional radix-4 systolic array.
Original language | English |
---|---|
Title of host publication | 2020 57th ACM/IEEE Design Automation Conference, DAC 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781450367257 |
DOIs | |
Publication status | Published - 2020 Jul |
Event | 57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States Duration: 2020 Jul 20 → 2020 Jul 24 |
Publication series
Name | Proceedings - Design Automation Conference |
---|---|
Volume | 2020-July |
ISSN (Print) | 0738-100X |
Conference
Conference | 57th ACM/IEEE Design Automation Conference, DAC 2020 |
---|---|
Country/Territory | United States |
City | Virtual, San Francisco |
Period | 20/7/20 → 20/7/24 |
Bibliographical note
Funding Information:In recent years, deep learning has demonstrated the predictive performance unbeatable by any other known methods. Deep neural networks have replaced many hand-crafted algorithms in various fields including computer vision, image/video compression, natural language processing, reinforcement learning, etc., [1–4]. However, deep neural networks require a massive amount of computation, which hinders the wide deployment of the models at various devices and slows down the innovations in the field of artificial intelligence. Thus, the demand for more compute power is ∗This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-TB1803-02. †Correspondence to: Jaeyong Chung<jychung@inu.ac.kr>.
Publisher Copyright:
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modelling and Simulation