We present a multi-scale deep convolutional neural network (CNN) for the task of automatic 2D-to-3D conversion. Traditional methods, which make a virtual view from a reference view, consist of separate stages i.e., depth (or disparity) estimation for the reference image and depth image-based rendering (DIBR) with estimated depth. In contrast, we reformulate the view synthesis task as an image reconstruction problem with a spatial transformer module and directly make stereo image pairs with a unified CNN framework without ground-truth depth as a supervision. We further propose a multi-scale deep architecture to capture the large displacements between images from coarse-level and enhance the detail from fine-level. Experimental results demonstrate the effectiveness of the proposed method over state-of-the-art approaches both qualitatively and quantitatively on the KITTI driving dataset.
|Title of host publication||2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings|
|Publisher||IEEE Computer Society|
|Number of pages||5|
|Publication status||Published - 2018 Feb 20|
|Event||24th IEEE International Conference on Image Processing, ICIP 2017 - Beijing, China|
Duration: 2017 Sep 17 → 2017 Sep 20
|Name||Proceedings - International Conference on Image Processing, ICIP|
|Other||24th IEEE International Conference on Image Processing, ICIP 2017|
|Period||17/9/17 → 17/9/20|
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2016R1A2A2A05921659).
© 2017 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Signal Processing