Summary With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets, and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable of joining forces of a system's CPU and graphics processing unit (GPU) for JPEG decoding. In this paper, we introduce a novel JPEG decoding scheme for heterogeneous architectures consisting of a CPU and a general-purpose GPU. We employ an offline profiling step to determine the performance of a system's CPU and GPU with respect to JPEG decoding. For a given JPEG image, our performance model uses: (1) the CPU and GPU performance characteristics, (2) the image entropy, and (3) the width and height of the image to balance the JPEG decoding workload on the underlying hardware. Our run-time partitioning and scheduling scheme exploits task, data, and pipeline parallelism by scheduling the non-parallelizable entropy-decoding task on the CPU, whereas inverse discrete cosine transformations, color conversions, and upsampling are conducted on both the CPU and the GPU. We have implemented the proposed method in the context of the libjpeg-turbo library, which is an industrial-strength JPEG encoding and decoding engine. Libjpeg-turbo's hand-optimized SIMD routines for ARM and x86 architectures constitute a competitive yardstick for the comparison with the proposed approach. We have evaluated our approach for a total of 7194 JPEG images across four high-end and middle-end CPU-GPU combinations including a mobile GPU. We achieve speedups of up to 5.2× over the SIMD version of libjpeg-turbo, and speedups of up to 10.5× over its sequential code. Taking into account the non-parallelizable JPEG entropy-decoding part, our approach achieves up to 97% of the theoretically attainable maximal speedup, with an average of 94%.
Bibliographical noteFunding Information:
This project has been supported by LG Electronics, by the National Research Foundation of Korea (NRF) funded by the Korean government (MEST) under grant number 2012K2A1A9054713, and by the Austrian Science Fund (FWF) project I 1035N23.
Copyright © 2015 John Wiley & Sons, Ltd.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics