Tracking persons-of-interest via adaptive discriminative features

Shun Zhang, Yihong Gong, Jia Bin Huang, Jongwoo Lim, Jinjun Wang, Narendra Ahuja, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multitarget tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets offline, we further adapt the pre-trained face CNN to specific videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form final trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate significant performance improvement over existing techniques.

Original languageEnglish
Title of host publicationComputer Vision - 14th European Conference, ECCV 2016, Proceedings
EditorsJiri Matas, Max Welling, Bastian Leibe, Nicu Sebe
PublisherSpringer Verlag
Pages415-433
Number of pages19
ISBN (Print)9783319464534
DOIs
Publication statusPublished - 2016 Jan 1

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9909 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Person
Face
Neural networks
Neural Networks
Clustering algorithms
Lighting
Semantics
Trajectories
Face Tracking
Multi-target Tracking
Hierarchical Clustering
Training Samples
Loss Function
Euclidean Distance
Music
Clustering Algorithm
Illumination
Optimise
Trajectory
Evaluate

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhang, S., Gong, Y., Huang, J. B., Lim, J., Wang, J., Ahuja, N., & Yang, M. H. (2016). Tracking persons-of-interest via adaptive discriminative features. In J. Matas, M. Welling, B. Leibe, & N. Sebe (Eds.), Computer Vision - 14th European Conference, ECCV 2016, Proceedings (pp. 415-433). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9909 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-46454-1_26
Zhang, Shun ; Gong, Yihong ; Huang, Jia Bin ; Lim, Jongwoo ; Wang, Jinjun ; Ahuja, Narendra ; Yang, Ming Hsuan. / Tracking persons-of-interest via adaptive discriminative features. Computer Vision - 14th European Conference, ECCV 2016, Proceedings. editor / Jiri Matas ; Max Welling ; Bastian Leibe ; Nicu Sebe. Springer Verlag, 2016. pp. 415-433 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c3b15ca40b7e4a59a23b9f0d04d74f61,
title = "Tracking persons-of-interest via adaptive discriminative features",
abstract = "Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multitarget tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets offline, we further adapt the pre-trained face CNN to specific videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form final trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate significant performance improvement over existing techniques.",
author = "Shun Zhang and Yihong Gong and Huang, {Jia Bin} and Jongwoo Lim and Jinjun Wang and Narendra Ahuja and Yang, {Ming Hsuan}",
year = "2016",
month = "1",
day = "1",
doi = "10.1007/978-3-319-46454-1_26",
language = "English",
isbn = "9783319464534",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "415--433",
editor = "Jiri Matas and Max Welling and Bastian Leibe and Nicu Sebe",
booktitle = "Computer Vision - 14th European Conference, ECCV 2016, Proceedings",
address = "Germany",

}

Zhang, S, Gong, Y, Huang, JB, Lim, J, Wang, J, Ahuja, N & Yang, MH 2016, Tracking persons-of-interest via adaptive discriminative features. in J Matas, M Welling, B Leibe & N Sebe (eds), Computer Vision - 14th European Conference, ECCV 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9909 LNCS, Springer Verlag, pp. 415-433. https://doi.org/10.1007/978-3-319-46454-1_26

Tracking persons-of-interest via adaptive discriminative features. / Zhang, Shun; Gong, Yihong; Huang, Jia Bin; Lim, Jongwoo; Wang, Jinjun; Ahuja, Narendra; Yang, Ming Hsuan.

Computer Vision - 14th European Conference, ECCV 2016, Proceedings. ed. / Jiri Matas; Max Welling; Bastian Leibe; Nicu Sebe. Springer Verlag, 2016. p. 415-433 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9909 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Tracking persons-of-interest via adaptive discriminative features

AU - Zhang, Shun

AU - Gong, Yihong

AU - Huang, Jia Bin

AU - Lim, Jongwoo

AU - Wang, Jinjun

AU - Ahuja, Narendra

AU - Yang, Ming Hsuan

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multitarget tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets offline, we further adapt the pre-trained face CNN to specific videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form final trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate significant performance improvement over existing techniques.

AB - Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multitarget tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets offline, we further adapt the pre-trained face CNN to specific videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form final trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate significant performance improvement over existing techniques.

UR - http://www.scopus.com/inward/record.url?scp=84990060978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990060978&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-46454-1_26

DO - 10.1007/978-3-319-46454-1_26

M3 - Conference contribution

AN - SCOPUS:84990060978

SN - 9783319464534

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 415

EP - 433

BT - Computer Vision - 14th European Conference, ECCV 2016, Proceedings

A2 - Matas, Jiri

A2 - Welling, Max

A2 - Leibe, Bastian

A2 - Sebe, Nicu

PB - Springer Verlag

ER -

Zhang S, Gong Y, Huang JB, Lim J, Wang J, Ahuja N et al. Tracking persons-of-interest via adaptive discriminative features. In Matas J, Welling M, Leibe B, Sebe N, editors, Computer Vision - 14th European Conference, ECCV 2016, Proceedings. Springer Verlag. 2016. p. 415-433. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-46454-1_26