Semantic co-segmentation in videos

Yi Hsuan Tsai, Guangyu Zhong, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

Original languageEnglish
Title of host publicationComputer Vision - 14th European Conference, ECCV 2016, Proceedings
EditorsNicu Sebe, Bastian Leibe, Max Welling, Jiri Matas
PublisherSpringer Verlag
Pages760-775
Number of pages16
ISBN (Print)9783319464923
DOIs
Publication statusPublished - 2016 Jan 1

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9908 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Segmentation
Semantics
Submodular Function
Object
Experiments
Prior Knowledge
Motion
Experiment

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Tsai, Y. H., Zhong, G., & Yang, M. H. (2016). Semantic co-segmentation in videos. In N. Sebe, B. Leibe, M. Welling, & J. Matas (Eds.), Computer Vision - 14th European Conference, ECCV 2016, Proceedings (pp. 760-775). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9908 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-46493-0_46
Tsai, Yi Hsuan ; Zhong, Guangyu ; Yang, Ming Hsuan. / Semantic co-segmentation in videos. Computer Vision - 14th European Conference, ECCV 2016, Proceedings. editor / Nicu Sebe ; Bastian Leibe ; Max Welling ; Jiri Matas. Springer Verlag, 2016. pp. 760-775 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{0bd105dad341484e9cb5fc2e29413c92,
title = "Semantic co-segmentation in videos",
abstract = "Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.",
author = "Tsai, {Yi Hsuan} and Guangyu Zhong and Yang, {Ming Hsuan}",
year = "2016",
month = "1",
day = "1",
doi = "10.1007/978-3-319-46493-0_46",
language = "English",
isbn = "9783319464923",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "760--775",
editor = "Nicu Sebe and Bastian Leibe and Max Welling and Jiri Matas",
booktitle = "Computer Vision - 14th European Conference, ECCV 2016, Proceedings",
address = "Germany",

}

Tsai, YH, Zhong, G & Yang, MH 2016, Semantic co-segmentation in videos. in N Sebe, B Leibe, M Welling & J Matas (eds), Computer Vision - 14th European Conference, ECCV 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, Springer Verlag, pp. 760-775. https://doi.org/10.1007/978-3-319-46493-0_46

Semantic co-segmentation in videos. / Tsai, Yi Hsuan; Zhong, Guangyu; Yang, Ming Hsuan.

Computer Vision - 14th European Conference, ECCV 2016, Proceedings. ed. / Nicu Sebe; Bastian Leibe; Max Welling; Jiri Matas. Springer Verlag, 2016. p. 760-775 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9908 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Semantic co-segmentation in videos

AU - Tsai, Yi Hsuan

AU - Zhong, Guangyu

AU - Yang, Ming Hsuan

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

AB - Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=84990031920&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990031920&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-46493-0_46

DO - 10.1007/978-3-319-46493-0_46

M3 - Conference contribution

AN - SCOPUS:84990031920

SN - 9783319464923

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 760

EP - 775

BT - Computer Vision - 14th European Conference, ECCV 2016, Proceedings

A2 - Sebe, Nicu

A2 - Leibe, Bastian

A2 - Welling, Max

A2 - Matas, Jiri

PB - Springer Verlag

ER -

Tsai YH, Zhong G, Yang MH. Semantic co-segmentation in videos. In Sebe N, Leibe B, Welling M, Matas J, editors, Computer Vision - 14th European Conference, ECCV 2016, Proceedings. Springer Verlag. 2016. p. 760-775. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-46493-0_46