Discrete cosine transform (DCT) is one of the major operations in image compression standards and it requires intensive and complex computations. Recent computer systems and handheld devices are equipped with high computing capability devices such as a general-purpose graphics processing unit (GPGPU) in addition to the traditional multicores CPU. We develop an optimized parallel implementation of the forward DCT algorithm for the JPEG image compression using the recently proposed Open Computing Language (OpenCL). This OpenCL parallel implementation combines a multicore CPU and a GPGPU in a single solution to perform DCT computations in an efficient manner by applying certain optimization techniques to enhance the kernel execution time and data movements. A separate optimal OpenCL kernel code was developed (CPU-based and GPU-based kernels) based on certain appropriate device-based optimization factors, such as threadmapping, thread granularity, vector-based memory access, and the given workload. The performance of DCT is evaluated on a heterogeneous environment and our OpenCL parallel implementation results in speeding up the execution of the DCT by the factors of 3.68 and 5.58 for different image sizes and formats in terms of workload allocations and data transfer mechanisms. The obtained speedup indicates the scalability of the DCT performance. 2014 SPIE and IS&T.
|Journal||Journal of Electronic Imaging|
|Publication status||Published - 2014 Nov 1|
Bibliographical noteFunding Information:
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and future Planning (NRF-2012R1A1A2043400).
© 2020 Royal Society of Chemistry. All rights reserved.
All Science Journal Classification (ASJC) codes
- Atomic and Molecular Physics, and Optics
- Computer Science Applications
- Electrical and Electronic Engineering