This paper describes a non-convex model that is carefully designed for high quality depth upsampling. Modern depth sensors such as time-of-flight cameras provide a promising depth measurement with video rate, but suffer from noise and low resolution. To tackle these limitations, we formulate an optimization problem using a robust potential function. In this formulation, a nonlocal principle established in the high-dimensional feature space is used to disambiguate the up-sampling problem. We also derive a numerical algorithm based on the majorization-minimization approach for efficient optimization. The proposed model iteratively creates a new affinity space that determines the influence of neighboring pixels by jointly considering spatial distance, appearance, and current estimates. This behavior enables one to significantly reduce annoying artifacts on a variety of range dataset, including a challenging real measurement. Extensive experiments demonstrate that the proposed model achieves competitive performance with state-of-the-art methods.