PiCANet: Pixel-Wise Contextual Attention Learning for Accurate Saliency Detection

Nian Liu, Junwei Han, Ming Hsuan Yang

Research output: Contribution to journalArticle


Existing saliency models typically incorporate contexts holistically. However, for each pixel, usually only part of its context region contributes to saliency prediction, while other parts are likely either noise or distractions. In this paper, we propose a novel pixel-wise contextual attention network (PiCANet) to selectively attend to informative context locations at each pixel. The proposed PiCANet generates an attention map over the contextual region of each pixel and construct attentive contextual features via selectively incorporating the features of useful context locations. We present three formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts. All the three models are fully differentiable and can be integrated with convolutional neural networks with joint training. In this work, we introduce the proposed PiCANets into a U-Net model for salient object detection. The generated global and local attention maps can learn to incorporate global contrast and regional smoothness, which help localize and highlight salient objects more accurately and uniformly. Experimental results show that the proposed PiCANets perform effectively for saliency detection against the state-of-the-art methods. Furthermore, we demonstrate the effectiveness and generalization ability of the PiCANets on semantic segmentation and object detection with improved performance.

Original languageEnglish
Article number9076883
Pages (from-to)6438-6451
Number of pages14
JournalIEEE Transactions on Image Processing
Publication statusPublished - 2020

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint Dive into the research topics of 'PiCANet: Pixel-Wise Contextual Attention Learning for Accurate Saliency Detection'. Together they form a unique fingerprint.

  • Cite this