Joint-task self-supervised learning for temporal correspondence

Xueting Li, Sifei Liu, Shalini de Mello, Xiaolong Wang, Jan Kautz, Ming Hsuan Yang

Research output: Contribution to journalConference articlepeer-review

9 Citations (Scopus)

Abstract

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions and establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet. The project website can be found at https://sites.google.com/view/uvc2019/.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume32
Publication statusPublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 2019 Dec 82019 Dec 14

Bibliographical note

Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Joint-task self-supervised learning for temporal correspondence'. Together they form a unique fingerprint.

Cite this