Refine and recycle: A method to increase decompression parallelism

Jian Fang, Jianyu Chen, Jinho Lee, Zaid Al-Ars, H. Peter Hofstee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Rapid increases in storage bandwidth, combined with a desire for operating on large datasets interactively, drives the need for improvements in high-bandwidth decompression. Existing designs either process only one token per cycle or process multiple tokens per cycle with low area efficiency and/or low clock frequency. We propose two techniques to achieve high single-decoder throughput at improved efficiency by keeping only a single copy of the history data across multiple BRAMs and operating on each BRAM independently. A first stage efficiently refines the tokens into commands that operate on a single BRAM and steers the commands to the appropriate one. In the second stage, a relaxed execution model is used where each BRAM command executes immediately and those with invalid data are recycled to avoid stalls caused by the read-after-write dependency. We apply these techniques to Snappy decompression and implement a Snappy decompression accelerator on a CAPI2-attached FPGA platform equipped with a Xilinx VU3P FPGA. Experimental results show that our proposed method achieves up to 7.2 GB/s output throughput per decompressor, with each decompressor using 14.2% of the logic and 7% of the BRAM resources of the device. Therefore, a single decompressor can easily keep pace with an NVMe device (PCIe Gen3 x4) on a small FPGA, while a larger device, integrated on a host bridge adapter and instantiating multiple decompressors, can keep pace with the full OpenCAPI 3.0 bandwidth of 25 GB/s.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages272-280
Number of pages9
ISBN (Electronic)9781728116013
DOIs
Publication statusPublished - 2019 Jul
Event30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019 - New York, United States
Duration: 2019 Jul 152019 Jul 17

Publication series

NameProceedings of the International Conference on Application-Specific Systems, Architectures and Processors
Volume2019-July
ISSN (Print)1063-6862

Conference

Conference30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019
CountryUnited States
CityNew York
Period19/7/1519/7/17

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Refine and recycle: A method to increase decompression parallelism'. Together they form a unique fingerprint.

  • Cite this

    Fang, J., Chen, J., Lee, J., Al-Ars, Z., & Hofstee, H. P. (2019). Refine and recycle: A method to increase decompression parallelism. In Proceedings - 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019 (pp. 272-280). [8825015] (Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors; Vol. 2019-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASAP.2019.00017