Selective residual learning for Visual Question Answering

Jongkwang Hong, Sungho Park, Hyeran Byun

Research output: Contribution to journalArticle

Abstract

Visual Question Answering (VQA) aims to reason an answer, given a textual question and image pair. VQA methods are required to learn the relationship between image region features. These methods have the limitation of inefficient learning that can produce a performance drop. It is because current intra-relationship methods are trying to learn all the intra-relationships, regardless of their importance. In this paper, a novel self-attention based VQA module named Selective Residual learning (SelRes) is proposed. SelRes processes the residual learning selectively in self-attention networks. It measures the importance of the input vectors by the attention map and limits residual learning, except in the selected regions which related to the correct answer. Selective masking is also proposed, which can ensure that the selection in SelRes is preserved in the multi-stack structure of the VQA network. Our full model achieves new state-of-the-art performances on both from-scratch and fine-tuning models.

Original languageEnglish
JournalNeurocomputing
DOIs
Publication statusAccepted/In press - 2020 Jan 1

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Selective residual learning for Visual Question Answering'. Together they form a unique fingerprint.

  • Cite this