Estimating depth from a single monocular image is a fundamental problem in computer vision. Traditional methods for such estimation usually require complicated and sometimes labor-intensive processing. In this paper, we propose a new perspective for this problem and suggest a new gradient-domain learning framework which is much simpler and more efficient. Inspired by the observation that there is substantial co-occurrence of image edges and depth discontinuities in natural scenes, we learn the relationship between local appearance features and corresponding depth gradients by making use of the K-means clustering algorithm within the image feature space. We then encode each cluster centroid with its associated depth gradients, which defines visual-depth words that model the image-depth relationship very well. This enables one to estimate the scene depth for an arbitrary image by simply selecting proper depth gradients from a compact dictionary of visual-depth words, followed by a Poisson surface reconstruction. Experimental results demonstrate that the proposed gradient-domain approach outperforms state-of-the-art methods both qualitatively and quantitatively and is generic over (unseen) scene categories which are not used for training.
|Title of host publication||2015 IEEE International Conference on Image Processing, ICIP 2015 - Proceedings|
|Publisher||IEEE Computer Society|
|Number of pages||5|
|Publication status||Published - 2015 Dec 9|
|Event||IEEE International Conference on Image Processing, ICIP 2015 - Quebec City, Canada|
Duration: 2015 Sep 27 → 2015 Sep 30
|Name||Proceedings - International Conference on Image Processing, ICIP|
|Other||IEEE International Conference on Image Processing, ICIP 2015|
|Period||15/9/27 → 15/9/30|
Bibliographical notePublisher Copyright:
© 2015 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Signal Processing