Dual Compositional Learning in Interactive Image Retrieval

Jongseok Kim, Youngjae Yu, Hoeseong Kim, Gunhee Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present an approach named Dual Composition Network (DCNet) for interactive image retrieval that searches for the best target image for a natural language query and a reference image. To accomplish this task, existing methods have focused on learning a composite representation of the reference image and the text query to be as close to the embedding of the target image as possible. We refer this approach as Composition Network. In this work, we propose to close the loop with Correction Network that models the difference between the reference and target image in the embedding space and matches it with the embedding of the text query. That is, we consider two cyclic directional mappings for triplets of (reference image, text query, target image) by using both Composition Network and Correction Network. We also propose a joint training loss that can further improve the robustness of multimodal representation learning. We evaluate the proposed model on three benchmark datasets for multimodal retrieval: Fashion-IQ, Shoes, and Fashion200K. Our experiments show that our DCNet achieves new state-of-the-art performance on all three datasets, and the addition of Correction Network consistently improves multiple existing methods that are solely based on Composition Network. Moreover, an ensemble of our model won the first place in Fashion-IQ 2020 challenge held in a CVPR 2020 workshop.

Original languageEnglish
Title of host publication35th AAAI Conference on Artificial Intelligence, AAAI 2021
PublisherAssociation for the Advancement of Artificial Intelligence
Pages1771-1779
Number of pages9
ISBN (Electronic)9781713835974
Publication statusPublished - 2021
Event35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
Duration: 2021 Feb 22021 Feb 9

Publication series

Name35th AAAI Conference on Artificial Intelligence, AAAI 2021
Volume2B

Conference

Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
CityVirtual, Online
Period21/2/221/2/9

Bibliographical note

Funding Information:
We thank SNUVL lab members for helpful comments. This research was supported by AIR Lab (AI Research Lab) in Hyundai Motor Company through HMC-SNUAI Consortium Fund and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01082, SW Star-Lab and No. 2017-0-01772, Video Turing Test). Jongseok Kim was supported by Hyundai Motor Chung Mong-Koo Foundation. Gunhee Kim is the corresponding author.

Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Dual Compositional Learning in Interactive Image Retrieval'. Together they form a unique fingerprint.

Cite this