Diverse Image-to-Image Translation via Disentangled Representations

Hsin Ying Lee, Hung Yu Tseng, Jia Bin Huang, Maneesh Singh, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsMartial Hebert, Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss
PublisherSpringer Verlag
Pages36-52
Number of pages17
ISBN (Print)9783030012458
DOIs
Publication statusPublished - 2018 Jan 1
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 2018 Sep 82018 Sep 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11205 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
CountryGermany
CityMunich
Period18/9/818/9/14

Fingerprint

Output
Attribute
Cycle
Invariant
Evaluation
Range of data
Training
Model
Vision

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M., & Yang, M. H. (2018). Diverse Image-to-Image Translation via Disentangled Representations. In M. Hebert, V. Ferrari, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 36-52). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01246-5_3
Lee, Hsin Ying ; Tseng, Hung Yu ; Huang, Jia Bin ; Singh, Maneesh ; Yang, Ming Hsuan. / Diverse Image-to-Image Translation via Disentangled Representations. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Martial Hebert ; Vittorio Ferrari ; Cristian Sminchisescu ; Yair Weiss. Springer Verlag, 2018. pp. 36-52 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5c897c42bfdb4db58e97e45dcf64a9f9,
title = "Diverse Image-to-Image Translation via Disentangled Representations",
abstract = "Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.",
author = "Lee, {Hsin Ying} and Tseng, {Hung Yu} and Huang, {Jia Bin} and Maneesh Singh and Yang, {Ming Hsuan}",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01246-5_3",
language = "English",
isbn = "9783030012458",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "36--52",
editor = "Martial Hebert and Vittorio Ferrari and Cristian Sminchisescu and Yair Weiss",
booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",
address = "Germany",

}

Lee, HY, Tseng, HY, Huang, JB, Singh, M & Yang, MH 2018, Diverse Image-to-Image Translation via Disentangled Representations. in M Hebert, V Ferrari, C Sminchisescu & Y Weiss (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11205 LNCS, Springer Verlag, pp. 36-52, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 18/9/8. https://doi.org/10.1007/978-3-030-01246-5_3

Diverse Image-to-Image Translation via Disentangled Representations. / Lee, Hsin Ying; Tseng, Hung Yu; Huang, Jia Bin; Singh, Maneesh; Yang, Ming Hsuan.

Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Martial Hebert; Vittorio Ferrari; Cristian Sminchisescu; Yair Weiss. Springer Verlag, 2018. p. 36-52 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Diverse Image-to-Image Translation via Disentangled Representations

AU - Lee, Hsin Ying

AU - Tseng, Hung Yu

AU - Huang, Jia Bin

AU - Singh, Maneesh

AU - Yang, Ming Hsuan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.

AB - Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.

UR - http://www.scopus.com/inward/record.url?scp=85055123196&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055123196&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01246-5_3

DO - 10.1007/978-3-030-01246-5_3

M3 - Conference contribution

AN - SCOPUS:85055123196

SN - 9783030012458

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 36

EP - 52

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Hebert, Martial

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Weiss, Yair

PB - Springer Verlag

ER -

Lee HY, Tseng HY, Huang JB, Singh M, Yang MH. Diverse Image-to-Image Translation via Disentangled Representations. In Hebert M, Ferrari V, Sminchisescu C, Weiss Y, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 36-52. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01246-5_3