JParEnt

Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures

Wasuwee Sodsong, Minyoung Jung, Jinwoo Park, bernd Burgstaller

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg-turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97% of the maximum attainable speedup (95% on average). On the IGP-based desktop platform, JParEnt achieves energy savings of up to 45% compared to libjpeg-turbo's SIMD-implementation.

Original languageEnglish
Article numbere4111
JournalConcurrency Computation
Volume29
Issue number15
DOIs
Publication statusPublished - 2017 Aug 10

Fingerprint

Decoding
Entropy
Graphics Processors
Program processors
Graphics Processing Unit
Huffman Codes
Architecture
Data Transfer
Data transfer
Energy Saving
Experimental Evaluation
Embedded systems
Leverage
Embedded Systems
Parallelism
Workload
Partitioning
Energy conservation
Speedup
Computer systems

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Cite this

@article{5ef1e9b3ed07413c88926988b86e4e67,
title = "JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures",
abstract = "The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg-turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45{\%} of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97{\%} of the maximum attainable speedup (95{\%} on average). On the IGP-based desktop platform, JParEnt achieves energy savings of up to 45{\%} compared to libjpeg-turbo's SIMD-implementation.",
author = "Wasuwee Sodsong and Minyoung Jung and Jinwoo Park and bernd Burgstaller",
year = "2017",
month = "8",
day = "10",
doi = "10.1002/cpe.4111",
language = "English",
volume = "29",
journal = "Concurrency Computation Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd",
number = "15",

}

JParEnt : Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures. / Sodsong, Wasuwee; Jung, Minyoung; Park, Jinwoo; Burgstaller, bernd.

In: Concurrency Computation, Vol. 29, No. 15, e4111, 10.08.2017.

Research output: Contribution to journalArticle

TY - JOUR

T1 - JParEnt

T2 - Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures

AU - Sodsong, Wasuwee

AU - Jung, Minyoung

AU - Park, Jinwoo

AU - Burgstaller, bernd

PY - 2017/8/10

Y1 - 2017/8/10

N2 - The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg-turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97% of the maximum attainable speedup (95% on average). On the IGP-based desktop platform, JParEnt achieves energy savings of up to 45% compared to libjpeg-turbo's SIMD-implementation.

AB - The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg-turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97% of the maximum attainable speedup (95% on average). On the IGP-based desktop platform, JParEnt achieves energy savings of up to 45% compared to libjpeg-turbo's SIMD-implementation.

UR - http://www.scopus.com/inward/record.url?scp=85022023893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85022023893&partnerID=8YFLogxK

U2 - 10.1002/cpe.4111

DO - 10.1002/cpe.4111

M3 - Article

VL - 29

JO - Concurrency Computation Practice and Experience

JF - Concurrency Computation Practice and Experience

SN - 1532-0626

IS - 15

M1 - e4111

ER -