Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms. In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training. The deep classifier learns to attend to the regions of target objects robust to appearance changes. Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches.
|Number of pages||11|
|Journal||Advances in Neural Information Processing Systems|
|Publication status||Published - 2018|
|Event||32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada|
Duration: 2018 Dec 2 → 2018 Dec 8
Bibliographical noteFunding Information:
The work is supported in part by the Beijing Municipal Science and Technology Commission project under Grant No. Z181100001918005, Fundamental Research Funds for the Central Universities (2017RC08), NSF CAREER Grant No. 1149783, and gifts from NVIDIA. Shi Pu is supported by a scholarship from China Scholarship Council.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing