Personalized recommendation systems have become a major AI application in modern data centers. The main challenges in processing personalized recommendation inferences are the large memory footprint and high bandwidth requirement of embedding layers. To overcome the capacity limit and bandwidth congestion of on-chip memory, near memory processing (NMP) can be a promising solution. Recent work on accelerating personalized recommendations proposes a DIMMbased NMP design to solve the bandwidth problem and increases memory capacity. The performance of NMP is determined by the internal bandwidth and the prior DIMM-based approach utilizes more DIMMs to achieve higher operation throughput. However, extending the number of DIMMs could eventually lead to significant power consumption due to inefficient scaling. We propose SPACE, a novel heterogeneous memory architecture, which is efficient in terms of performance and energy. SPACE exploits a compute-capable 3D-stacked DRAM with DIMMs for personalized recommendations. Prior to designing the proposed system, we give a quantitative analysis of the user/item interactions and define the two localities: gather locality and reduction locality. In gather operations, we find only a small proportion of items are highly-accessed by users, and we call this gather locality. Also, we define reduction locality as the reusability of the gathered items in reduction operations. Based on the gather locality, SPACE allocates highly-accessed embedding items to the 3D-stacked DRAM to achieve the maximum bandwidth. Subsequently, by exploiting reduction locality, we utilize the remaining space of the 3D-stacked DRAM to store and reuse repeated partial sums, thereby minimizing the required number of element-wise reduction operations. As a result, the evaluation shows that SPACE achieves 3.2× performance improvement and 56% energy saving over the previous DIMM-based NMPs leveraging 3D-stacked DRAM with a 1/8 size of DIMMs. Also, compared to the state-of-the-art DRAM cache designs with the same NMP configuration, SPACE achieves an average 32.7% of performance improvement.
|Title of host publication||Proceedings - 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, ISCA 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||13|
|Publication status||Published - 2021 Jun|
|Event||48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 - Virtual, Online, Spain|
Duration: 2021 Jun 14 → 2021 Jun 19
|Name||Proceedings - International Symposium on Computer Architecture|
|Conference||48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021|
|Period||21/6/14 → 21/6/19|
Bibliographical noteFunding Information:
This research was supported by the MOTIE (Ministry of Trade, Industry and Energy (No. 10080590, Technology Development of Unified Memory System for Heterogeneous System Architecture), and KSRC (Korea Semiconductor Research Consortium) support program for the development of the future semiconductor device. This work was also supported by Institute of Information communications Technology Planning Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00853, Developing Software Platform for Programming of PIM) and Samsung Electronics Co., Ltd (IO201210-07936-01)
ACKNOWLEDGMENT This research was supported by the MOTIE (Ministry of Trade, Industry & Energy (No. 10080590, TechnologyDevel-opment of Unified Memory System for Heterogeneous System Architecture), and KSRC (Korea Semiconductor Research Consortium) support program for the development of the future semiconductor device. This work was also supported by Institute of Information communications TechnologyPlanning Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00853, Developing Software Platform for Programming of PIM) and Samsung Electronics Co., Ltd (IO201210-07936-01). W. W. Ro is the corresponding author.
© 2021 IEEE.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture