This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild. Without requiring annotations of 3D mesh, 2D keypoints, or camera pose for each video frame, we pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video. We first learn a category-specific 3D reconstruction model from a collection of single-view images of the same category that jointly predicts the shape, texture, and camera pose of an image. Then, at inference time, we adapt the model to a test video over time using self-supervised regularization terms that exploit temporal consistency of an object instance to enforce that all reconstructed meshes share a common texture map, a base shape, as well as parts. We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild – an extremely challenging task rarely addressed before. Codes and other resources will be released at https://sites.google.com/nvidia.com/vmr-2020.
|Journal||Advances in Neural Information Processing Systems|
|Publication status||Published - 2020|
|Event||34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online|
Duration: 2020 Dec 6 → 2020 Dec 12
Bibliographical notePublisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing