Deep Monocular Depth Estimation via Integration of Global and Local Predictions

Youngjung Kim, Hyungjoo Jung, Dongbo Min, Kwanghoon Sohn

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.

Original languageEnglish
Pages (from-to)4131-4144
Number of pages14
JournalIEEE Transactions on Image Processing
Volume27
Issue number8
DOIs
Publication statusPublished - 2018 Aug 1

Fingerprint

Neural networks
Parameterization
Network architecture
Learning systems
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Cite this

Kim, Youngjung ; Jung, Hyungjoo ; Min, Dongbo ; Sohn, Kwanghoon. / Deep Monocular Depth Estimation via Integration of Global and Local Predictions. In: IEEE Transactions on Image Processing. 2018 ; Vol. 27, No. 8. pp. 4131-4144.
@article{6cb36bcb34dc42f79eab6848523f5893,
title = "Deep Monocular Depth Estimation via Integration of Global and Local Predictions",
abstract = "Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.",
author = "Youngjung Kim and Hyungjoo Jung and Dongbo Min and Kwanghoon Sohn",
year = "2018",
month = "8",
day = "1",
doi = "10.1109/TIP.2018.2836318",
language = "English",
volume = "27",
pages = "4131--4144",
journal = "IEEE Transactions on Image Processing",
issn = "1057-7149",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "8",

}

Deep Monocular Depth Estimation via Integration of Global and Local Predictions. / Kim, Youngjung; Jung, Hyungjoo; Min, Dongbo; Sohn, Kwanghoon.

In: IEEE Transactions on Image Processing, Vol. 27, No. 8, 01.08.2018, p. 4131-4144.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Deep Monocular Depth Estimation via Integration of Global and Local Predictions

AU - Kim, Youngjung

AU - Jung, Hyungjoo

AU - Min, Dongbo

AU - Sohn, Kwanghoon

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.

AB - Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85047010067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047010067&partnerID=8YFLogxK

U2 - 10.1109/TIP.2018.2836318

DO - 10.1109/TIP.2018.2836318

M3 - Article

VL - 27

SP - 4131

EP - 4144

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

IS - 8

ER -