Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions. However, such response maps generated by the classification network usually focus on discriminative object parts, due to the fact that the network does not need the entire object for optimizing the objective function. To enforce the network to pay attention to other parts of an object, we propose a simple yet effective approach that introduces a self-supervised task by exploiting the sub-category information. Specifically, we perform clustering on image features to generate pseudo sub-categories labels within each annotated parent class, and construct a sub-category objective to assign the network to a more challenging task. By iteratively clustering image features, the training process does not limit itself to the most discriminative object parts, hence improving the quality of the response maps. We conduct extensive analysis to validate the proposed method and show that our approach performs favorably against the state-of-the-art approaches.
|Number of pages||10|
|Journal||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|Publication status||Published - 2020|
|Event||2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States|
Duration: 2020 Jun 14 → 2020 Jun 19
Bibliographical noteFunding Information:
In this paper, we propose a simple yet effective approach to improve the class activation maps by introducing a self-supervised task to discover sub-categories in an unsupervised manner. Without bells and whistles, our approach performs favorably against existing weakly-supervised semantic segmentation methods. Specifically, we develop an iterative learning scheme by running clustering on image features for each parent class and train the classification network on sub-category objectives. Unlike other existing schemes that aggregate multiple response maps, our approach generates better initial predictions without introducing extra complexity or inference time to the model. We conduct extensive experimental analysis to demonstrate the effectiveness of our approach via exploiting the subcategory information. Finally, we show that our algorithm produces better activation maps, thereby improving the final semantic segmentation performance. Acknowledgments. This work is supported in part by the NSF CAREER Grant #1149783, and gifts from eBay and Google.
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition