TY - JOUR
T1 - VITAL
T2 - VIsual tracking via adversarial learning
AU - Song, Yibing
AU - Ma, Chao
AU - Wu, Xiaohe
AU - Gong, Lijun
AU - Bao, Linchao
AU - Zuo, Wangmeng
AU - Shen, Chunhua
AU - Lau, Rynson W.H.
AU - Yang, Ming Hsuan
N1 - Publisher Copyright:
Copyright © 2018, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2018/4/11
Y1 - 2018/4/11
N2 - The tracking-by-detection framework consists of two stages, i.e., drawing samples around the target object in the first stage and classifying each sample as the target object or as background in the second stage. The performance of existing trackers using deep classification networks is limited by two aspects. First, the positive samples in each frame are highly spatially overlapped, and they fail to capture rich appearance variations. Second, there exists extreme class imbalance between positive and negative samples. This paper presents the VITAL algorithm to address these two problems via adversarial learning. To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes. With the use of adversarial learning, our network identifies the mask that maintains the most robust features of the target objects over a long temporal span. In addition, to handle the issue of class imbalance, we propose a high-order cost sensitive loss to decrease the effect of easy negative samples to facilitate training the classification network. Extensive experiments on benchmark datasets demonstrate that the proposed tracker performs favorably against state-of-the-art approaches.
AB - The tracking-by-detection framework consists of two stages, i.e., drawing samples around the target object in the first stage and classifying each sample as the target object or as background in the second stage. The performance of existing trackers using deep classification networks is limited by two aspects. First, the positive samples in each frame are highly spatially overlapped, and they fail to capture rich appearance variations. Second, there exists extreme class imbalance between positive and negative samples. This paper presents the VITAL algorithm to address these two problems via adversarial learning. To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes. With the use of adversarial learning, our network identifies the mask that maintains the most robust features of the target objects over a long temporal span. In addition, to handle the issue of class imbalance, we propose a high-order cost sensitive loss to decrease the effect of easy negative samples to facilitate training the classification network. Extensive experiments on benchmark datasets demonstrate that the proposed tracker performs favorably against state-of-the-art approaches.
UR - http://www.scopus.com/inward/record.url?scp=85094554306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094554306&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85094554306
JO - Review of Economic Dynamics
JF - Review of Economic Dynamics
SN - 1094-2025
ER -