FUSE: Fusing STT-MRAMinto GPUs to alleviate off-chip memory access overheads

Jie Zhang, Myoungsoo Jung, Mahmut Taylan Kandemir

Research output: Contribution to journalArticlepeer-review

Abstract

In this work, we propose FUSE, a novel GPU cache system that integrates spin-Transfer torque magnetic random-Access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPU-s multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in comparison to a traditional GPU cache, our proposed heterogeneous cache reduces the number of outgoing memory references by 32% across the interconnection network, thereby improving the overall performance by 217% and reducing energy cost by 53%.

Original languageEnglish
JournalUnknown Journal
Publication statusPublished - 2019 Mar 5

All Science Journal Classification (ASJC) codes

  • General

Fingerprint Dive into the research topics of 'FUSE: Fusing STT-MRAMinto GPUs to alleviate off-chip memory access overheads'. Together they form a unique fingerprint.

Cite this