TY - JOUR
T1 - OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform
AU - Alqudami, Nasser
AU - Kim, Shin Dug
N1 - Publisher Copyright:
© 2015, Springer-Verlag Berlin Heidelberg.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2016/8/1
Y1 - 2016/8/1
N2 - Recent computer systems and handheld devices are equipped with high computing capability, such as general purpose GPUs (GPGPU) and multi-core CPUs. Utilizing such resources for computation has become a general trend, making their availability an important issue for the real-time aspect. Discrete cosine transform (DCT) and quantization are two major operations in image compression standards that require complex computations. In this paper, we develop an efficient parallel implementation of the forward DCT and quantization algorithms for JPEG image compression using Open Computing Language (OpenCL). This OpenCL-based parallel implementation utilizes a multi-core CPU and a GPGPU to perform DCT and quantization computations. We demonstrate the capability of this design via two proposed working scenarios. The proposed approach also applies certain optimization techniques to improve the kernel execution time and data movements. We developed an optimal OpenCL kernel for a particular device using device-based optimization factors, such as thread granularity, work-items mapping, workload allocation, and vector-based memory access. We evaluated the performance in a heterogeneous environment, finding that the proposed parallel implementation was able to speed up the execution time of the DCT and quantization by factors of 7.97 and 8.65, respectively, obtained from 1024 × 1024 and 2084 × 2048 image sizes in 4:4:4 format.
AB - Recent computer systems and handheld devices are equipped with high computing capability, such as general purpose GPUs (GPGPU) and multi-core CPUs. Utilizing such resources for computation has become a general trend, making their availability an important issue for the real-time aspect. Discrete cosine transform (DCT) and quantization are two major operations in image compression standards that require complex computations. In this paper, we develop an efficient parallel implementation of the forward DCT and quantization algorithms for JPEG image compression using Open Computing Language (OpenCL). This OpenCL-based parallel implementation utilizes a multi-core CPU and a GPGPU to perform DCT and quantization computations. We demonstrate the capability of this design via two proposed working scenarios. The proposed approach also applies certain optimization techniques to improve the kernel execution time and data movements. We developed an optimal OpenCL kernel for a particular device using device-based optimization factors, such as thread granularity, work-items mapping, workload allocation, and vector-based memory access. We evaluated the performance in a heterogeneous environment, finding that the proposed parallel implementation was able to speed up the execution time of the DCT and quantization by factors of 7.97 and 8.65, respectively, obtained from 1024 × 1024 and 2084 × 2048 image sizes in 4:4:4 format.
UR - http://www.scopus.com/inward/record.url?scp=84929688045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84929688045&partnerID=8YFLogxK
U2 - 10.1007/s11554-015-0507-5
DO - 10.1007/s11554-015-0507-5
M3 - Article
AN - SCOPUS:84929688045
VL - 12
SP - 219
EP - 235
JO - Journal of Real-Time Image Processing
JF - Journal of Real-Time Image Processing
SN - 1861-8200
IS - 2
ER -