Unsupervised keypoint learning for guiding class-conditional video prediction

Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim

Research output: Contribution to journalConference articlepeer-review

20 Citations (Scopus)

Abstract

We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is then translated following the predicted keypoints sequence to compose future frames. Detecting the keypoints is central to our algorithm, and our method is trained to detect the keypoints of arbitrary objects in an unsupervised manner. Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume32
Publication statusPublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 2019 Dec 82019 Dec 14

Bibliographical note

Funding Information:
Acknowledgement This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1701-01.

Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Unsupervised keypoint learning for guiding class-conditional video prediction'. Together they form a unique fingerprint.

Cite this