We propose a novel and unified solution for user-guided video object segmentation tasks. In this work, we consider two scenarios of user-guided segmentation: semi-supervised and interactive segmentation. Due to the nature of the problem, available cues - video frame(s) with object masks (or scribbles) - become richer with the intermediate predictions (or additional user inputs). However, the existing methods make it impossible to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learning to read relevant information from all available sources. In the semi-supervised scenario, the previous frames with object masks form an external memory, and the current frame as the query is segmented using the information in the memory. Similarly, to work with user interactions, the frames that are given user inputs form the memory that guides segmentation. Internally, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. The abundant use of the guidance information allows us to better handle challenges such as appearance changes and occlusions. We validate our method on the latest benchmark sets and achieve state-of-the-art performance along with a fast runtime.
|Number of pages||14|
|Journal||IEEE transactions on pattern analysis and machine intelligence|
|Publication status||Published - 2022 Jan|
Bibliographical noteFunding Information:
This work was supported in part by the ICT R&D program of MSIT/IITP (2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test), the Technology Innovation Program (10073129, Development of Driving and Manipulation Intelligence based on Deep Learning and Inverse Reinforcement Learning for Dual Arm Mobile Robot) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea), and the Graduate School of Yonsei University Research Scholarship Grants in 2020.
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics