Hierarchical convolutional features for visual tracking

Chao Ma, Jia Bin Huang, Xiaokang Yang, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

818 Citations (Scopus)

Abstract

Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.

Original languageEnglish
Title of host publication2015 International Conference on Computer Vision, ICCV 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3074-3082
Number of pages9
ISBN (Electronic)9781467383912
DOIs
Publication statusPublished - 2015 Feb 17
Event15th IEEE International Conference on Computer Vision, ICCV 2015 - Santiago, Chile
Duration: 2015 Dec 112015 Dec 18

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2015 International Conference on Computer Vision, ICCV 2015
ISSN (Print)1550-5499

Other

Other15th IEEE International Conference on Computer Vision, ICCV 2015
CountryChile
CitySantiago
Period15/12/1115/12/18

Fingerprint

Object recognition
Semantics
Neural networks

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2015). Hierarchical convolutional features for visual tracking. In 2015 International Conference on Computer Vision, ICCV 2015 (pp. 3074-3082). [7410709] (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2015 International Conference on Computer Vision, ICCV 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2015.352
Ma, Chao ; Huang, Jia Bin ; Yang, Xiaokang ; Yang, Ming Hsuan. / Hierarchical convolutional features for visual tracking. 2015 International Conference on Computer Vision, ICCV 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 3074-3082 (Proceedings of the IEEE International Conference on Computer Vision).
@inproceedings{463b3ff905a74f699443f4db1ea17d0d,
title = "Hierarchical convolutional features for visual tracking",
abstract = "Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.",
author = "Chao Ma and Huang, {Jia Bin} and Xiaokang Yang and Yang, {Ming Hsuan}",
year = "2015",
month = "2",
day = "17",
doi = "10.1109/ICCV.2015.352",
language = "English",
series = "Proceedings of the IEEE International Conference on Computer Vision",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "3074--3082",
booktitle = "2015 International Conference on Computer Vision, ICCV 2015",
address = "United States",

}

Ma, C, Huang, JB, Yang, X & Yang, MH 2015, Hierarchical convolutional features for visual tracking. in 2015 International Conference on Computer Vision, ICCV 2015., 7410709, Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, Institute of Electrical and Electronics Engineers Inc., pp. 3074-3082, 15th IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 15/12/11. https://doi.org/10.1109/ICCV.2015.352

Hierarchical convolutional features for visual tracking. / Ma, Chao; Huang, Jia Bin; Yang, Xiaokang; Yang, Ming Hsuan.

2015 International Conference on Computer Vision, ICCV 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 3074-3082 7410709 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2015 International Conference on Computer Vision, ICCV 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Hierarchical convolutional features for visual tracking

AU - Ma, Chao

AU - Huang, Jia Bin

AU - Yang, Xiaokang

AU - Yang, Ming Hsuan

PY - 2015/2/17

Y1 - 2015/2/17

N2 - Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.

AB - Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=84973869904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973869904&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2015.352

DO - 10.1109/ICCV.2015.352

M3 - Conference contribution

AN - SCOPUS:84973869904

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 3074

EP - 3082

BT - 2015 International Conference on Computer Vision, ICCV 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Ma C, Huang JB, Yang X, Yang MH. Hierarchical convolutional features for visual tracking. In 2015 International Conference on Computer Vision, ICCV 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 3074-3082. 7410709. (Proceedings of the IEEE International Conference on Computer Vision). https://doi.org/10.1109/ICCV.2015.352