We address the problem of weakly supervised object localization where only image-level annotations are available for training. Many existing approaches tackle this problem through object proposal mining. However, a substantial amount of noise in object proposals causes ambiguities for learning discriminative object models. Such approaches are sensitive to model initialization and often converge to an undesirable local minimum. In this paper, we address this problem by progressive domain adaptation with two main steps: classification adaptation and detection adaptation. In classification adaptation, we transfer a pre-trained network to our multi-label classification task for recognizing the presence of a certain object in an image. In detection adaptation, we first use a mask-out strategy to collect class-specific object proposals and apply multiple instance learning to mine confident candidates. We then use these selected object proposals to fine-tune all the layers, resulting in a fully adapted detection network. We extensively evaluate the localization performance on the PASCAL VOC and ILSVRC datasets and demonstrate significant performance improvement over the state-of-the-art methods.
|Title of host publication||Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016|
|Publisher||IEEE Computer Society|
|Number of pages||9|
|Publication status||Published - 2016 Dec 9|
|Event||29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 - Las Vegas, United States|
Duration: 2016 Jun 26 → 2016 Jul 1
|Name||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|Conference||29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016|
|Period||16/6/26 → 16/7/1|
Bibliographical notePublisher Copyright:
© 2016 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition