Factored radix-8 systolic array for tensor processing

Inayat Ullah, Kashif Inayat, Joon Sung Yang, Jaeyong Chung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)


Systolic arrays are re-gaining the attention as the heart to accelerate machine learning workloads. This paper shows that a large design space exists at the logic level despite the simple structure of systolic arrays and proposes a novel systolic array based on factoring and radix-8 multipliers. The factored systolic array (FSA) extracts out the booth encoding and the hard-multiple generation which is common across all processing elements, reducing the delay and the area of the whole systolic array. This factoring is done at the cost of an increased number of registers, however, the reduced pipeline register requirement in radix-8 offsets this effect. The proposed factored 16-bit multiplier achieves up to 15%, 13%, and 23% better delay, area, and power, respectively, compared with the radix-4 multipliers even if the register overhead is included. The proposed FSA architecture improves delay, area, and power up to 11%, 20% and 31%, respectively, for different bitwidths when compared with the conventional radix-4 systolic array.

Original languageEnglish
Title of host publication2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781450367257
Publication statusPublished - 2020 Jul
Event57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States
Duration: 2020 Jul 202020 Jul 24

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X


Conference57th ACM/IEEE Design Automation Conference, DAC 2020
Country/TerritoryUnited States
CityVirtual, San Francisco

Bibliographical note

Funding Information:
In recent years, deep learning has demonstrated the predictive performance unbeatable by any other known methods. Deep neural networks have replaced many hand-crafted algorithms in various fields including computer vision, image/video compression, natural language processing, reinforcement learning, etc., [1–4]. However, deep neural networks require a massive amount of computation, which hinders the wide deployment of the models at various devices and slows down the innovations in the field of artificial intelligence. Thus, the demand for more compute power is ∗This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-TB1803-02. †Correspondence to: Jaeyong Chung<jychung@inu.ac.kr>.

Publisher Copyright:
© 2020 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modelling and Simulation


Dive into the research topics of 'Factored radix-8 systolic array for tensor processing'. Together they form a unique fingerprint.

Cite this