Weakly-supervised disentangling with recurrent transformations for 3D view synthesis

Jimei Yang, Scott Reed, Ming Hsuan Yang, Honglak Lee

Research output: Contribution to journalConference article

103 Citations (Scopus)

Abstract

An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

Original languageEnglish
Pages (from-to)1099-1107
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2015-January
Publication statusPublished - 2015 Jan 1
Event29th Annual Conference on Neural Information Processing Systems, NIPS 2015 - Montreal, Canada
Duration: 2015 Dec 72015 Dec 12

Fingerprint

Observability
Neural networks

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

@article{59604ad8ad9d4b53848274c7b0f0c8e5,
title = "Weakly-supervised disentangling with recurrent transformations for 3D view synthesis",
abstract = "An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.",
author = "Jimei Yang and Scott Reed and Yang, {Ming Hsuan} and Honglak Lee",
year = "2015",
month = "1",
day = "1",
language = "English",
volume = "2015-January",
pages = "1099--1107",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. / Yang, Jimei; Reed, Scott; Yang, Ming Hsuan; Lee, Honglak.

In: Advances in Neural Information Processing Systems, Vol. 2015-January, 01.01.2015, p. 1099-1107.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Weakly-supervised disentangling with recurrent transformations for 3D view synthesis

AU - Yang, Jimei

AU - Reed, Scott

AU - Yang, Ming Hsuan

AU - Lee, Honglak

PY - 2015/1/1

Y1 - 2015/1/1

N2 - An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

AB - An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

UR - http://www.scopus.com/inward/record.url?scp=84965161391&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84965161391&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84965161391

VL - 2015-January

SP - 1099

EP - 1107

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -