Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.
|Title of host publication||Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings|
|Editors||Martial Hebert, Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss|
|Number of pages||17|
|Publication status||Published - 2018|
|Event||15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany|
Duration: 2018 Sep 8 → 2018 Sep 14
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||15th European Conference on Computer Vision, ECCV 2018|
|Period||18/9/8 → 18/9/14|
Bibliographical noteFunding Information:
Acknowledgements. This work is supported in part by the NSF CAREER Grant #1149783, the NSF Grant #1755785, and gifts from Verisk, Adobe and Nvidia.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)