Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation

Donghyeon Baek, Youngmin Oh, Bumsub Ham

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)


We address the problem of generalized zero-shot semantic segmentation (GZS3) predicting pixel-wise semantic labels for seen and unseen classes. Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classes from corresponding semantic ones (e.g., word2vec) to train novel classifiers for both seen and unseen classes. Although generative methods show decent performance, they have two limitations: (1) the visual features are biased towards seen classes; (2) the classifier should be retrained whenever novel unseen classes appear. We propose a discriminative approach to address these limitations in a unified framework. To this end, we leverage visual and semantic encoders to learn a joint embedding space, where the semantic encoder transforms semantic features to semantic prototypes that act as centers for visual features of corresponding classes. Specifically, we introduce boundary-aware regression (BAR) and semantic consistency (SC) losses to learn discriminative features. Our approach to exploiting the joint embedding space, together with BAR and SC terms, alleviates the seen bias problem. At test time, we avoid the retraining process by exploiting semantic prototypes as a nearest-neighbor (NN) classifier. To further alleviate the bias problem, we also propose an inference technique, dubbed Apollonius calibration (AC), that modulates the decision boundary of the NN classifier to the Apollonius circle adaptively. Experimental results demonstrate the effectiveness of our framework, achieving a new state of the art on standard benchmarks.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781665428125
Publication statusPublished - 2021
Event18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
Duration: 2021 Oct 112021 Oct 17

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499


Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
CityVirtual, Online

Bibliographical note

Funding Information:
We have introduced a discriminative approach, dubbed JoEm, that overcomes the limitations of generative ones in a unified framework. We have proposed two complementary losses to better learn representations in a joint embedding space. We have also presented a novel inference technique using the circle of Apollonius that alleviates a seen bias problem significantly. Finally, we have shown that our approach achieves a new state of the art on standard GZS3 benchmarks. Acknowledgments. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2019R1A2C2084816) and the Yonsei University Research Fund of 2021 (2021-22-0001).

Publisher Copyright:
© 2021 IEEE

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation'. Together they form a unique fingerprint.

Cite this