Video object segmentation using space-time memory networks

Seoung Wug Oh, Joon Young Lee, Ning Xu, Seon Joo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

252 Citations (Scopus)

Abstract

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9225-9234
Number of pages10
ISBN (Electronic)9781728148038
DOIs
Publication statusPublished - 2019 Oct
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: 2019 Oct 272019 Nov 2

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period19/10/2719/11/2

Bibliographical note

Funding Information:
Acknowledgment. This work is supported by the ICT R&D program of MSIT/IITP (2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test).

Publisher Copyright:
© 2019 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Video object segmentation using space-time memory networks'. Together they form a unique fingerprint.

Cite this