In this work, we propose FUSE, a novel GPU cache system that integrates spin-Transfer torque magnetic random-Access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPU-s multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in comparison to a traditional GPU cache, our proposed heterogeneous cache reduces the number of outgoing memory references by 32% across the interconnection network, thereby improving the overall performance by 217% and reducing energy cost by 53%.
|Publication status||Published - 2019 Mar 5|
All Science Journal Classification (ASJC) codes