We address the problem of weakly-supervised semantic segmentation (WSSS) using bounding box annotations. Although object bounding boxes are good indicators to segment corresponding objects, they do not specify object boundaries, making it hard to train convolutional neural networks (CNNs) for semantic segmentation. We find that background regions are perceptually consistent in part within an image, and this can be leveraged to discriminate foreground and background regions inside object bounding boxes. To implement this idea, we propose a novel pooling method, dubbed background-aware pooling (BAP), that focuses more on aggregating foreground features inside the bounding boxes using attention maps. This allows to extract high-quality pseudo segmentation labels to train CNNs for semantic segmentation, but the labels still contain noise especially at object boundaries. To address this problem, we also introduce a noise-aware loss (NAL) that makes the networks less susceptible to incorrect labels. Experimental results demonstrate that learning with our pseudo labels already outperforms state-of-the-art weakly- and semi-supervised methods on the PASCAL VOC 2012 dataset, and the NAL further boosts the performance.
|Title of host publication||Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021|
|Publisher||IEEE Computer Society|
|Number of pages||10|
|Publication status||Published - 2021|
|Event||2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States|
Duration: 2021 Jun 19 → 2021 Jun 25
|Name||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|Conference||2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021|
|Period||21/6/19 → 21/6/25|
Bibliographical noteFunding Information:
We have presented a novel pooling method for WSSS, dubbed BAP, using a background prior, that discriminates foreground and background regions inside object bounding boxes. We have shown that our BAP allows to produce better pseudo ground-truth labels compared to the conventional GAP. We have proposed a NAL for training a segmentation network, making it less susceptible to incorrect pseudo labels. Finally, we have shown that our approach achieves state-of-the-art performance on PASCAL VOC and MS-COCO. Acknowledgments. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2019R1A2C2084816).
© 2021 IEEE
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition