Weakly-supervised video scene co-parsing

Guangyu Zhong, Yi Hsuan Tsai, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only video-level category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking process. This ranking problem is formulated with a submodular objective function and a scene-object classifier is incorporated to distinguish scenes and objects. To assign each supervoxel a semantic label, we match each supervoxel to these selected representatives in the feature domain. Each supervoxel is then associated with a series of category potentials and assigned to a semantic label with the maximum one. The proposed co-parsing framework extends scene parsing from single images to videos and exploits mutual information among a video collection. Experimental results on the Wild-8 and SUNY-24 datasets show that the proposed algorithm performs favorably against the state-of-the-art approaches.

Original languageEnglish
Title of host publicationComputer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, Revised Selected Papers
EditorsYoichi Sato, Ko Nishino, Vincent Lepetit, Shang-Hong Lai
PublisherSpringer Verlag
Pages20-36
Number of pages17
ISBN (Print)9783319541808
DOIs
Publication statusPublished - 2017 Jan 1
Event13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Duration: 2016 Nov 202016 Nov 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10111 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Asian Conference on Computer Vision, ACCV 2016
CountryTaiwan, Province of China
City Taipei
Period16/11/2016/11/24

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhong, G., Tsai, Y. H., & Yang, M. H. (2017). Weakly-supervised video scene co-parsing. In Y. Sato, K. Nishino, V. Lepetit, & S-H. Lai (Eds.), Computer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, Revised Selected Papers (pp. 20-36). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10111 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-54181-5_2