Visual question answering on 360° images

Shih Han Chou, Wei Lun Chao, Wei Sheng Lai, Min Sun, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we introduce VQA 360°, a novel task of visual question answering on 360° images. Unlike a normal field-of-view image, a 360° image captures the entire visual content around the optical center of a camera, demanding more sophisticated spatial understanding and reasoning. To address this problem, we collect the first VQA 360° dataset, containing around 17, 000 real-world image-question-answer triplets for a variety of question types. We then study two different VQA models on VQA 360°, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360° image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions. We demonstrate that the cubemap-based model with multi-level fusion and attention diffusion performs favorably against other variants and the equirectangular-based models. Nevertheless, the gap between the humans' and machines' performance reveals the need for more advanced VQA 360° algorithms. We, therefore, expect our dataset and studies to serve as the benchmark for future development in this challenging task. Dataset, code, and pre-trained models are available online.1.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1596-1605
Number of pages10
ISBN (Electronic)9781728165530
DOIs
Publication statusPublished - 2020 Mar
Event2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020 - Snowmass Village, United States
Duration: 2020 Mar 12020 Mar 5

Publication series

NameProceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

Conference

Conference2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
CountryUnited States
CitySnowmass Village
Period20/3/120/3/5

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Visual question answering on 360° images'. Together they form a unique fingerprint.

  • Cite this

    Chou, S. H., Chao, W. L., Lai, W. S., Sun, M., & Yang, M. H. (2020). Visual question answering on 360° images. In Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020 (pp. 1596-1605). [9093452] (Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV45572.2020.9093452