Existing saliency models typically incorporate contexts holistically. However, for each pixel, usually only part of its context region contributes to saliency prediction, while other parts are likely either noise or distractions. In this paper, we propose a novel pixel-wise contextual attention network (PiCANet) to selectively attend to informative context locations at each pixel. The proposed PiCANet generates an attention map over the contextual region of each pixel and construct attentive contextual features via selectively incorporating the features of useful context locations. We present three formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts. All the three models are fully differentiable and can be integrated with convolutional neural networks with joint training. In this work, we introduce the proposed PiCANets into a U-Net model for salient object detection. The generated global and local attention maps can learn to incorporate global contrast and regional smoothness, which help localize and highlight salient objects more accurately and uniformly. Experimental results show that the proposed PiCANets perform effectively for saliency detection against the state-of-the-art methods. Furthermore, we demonstrate the effectiveness and generalization ability of the PiCANets on semantic segmentation and object detection with improved performance.
All Science Journal Classification (ASJC) codes
- Computer Graphics and Computer-Aided Design