TY - JOUR
T1 - Background Modeling via Uncertainty Estimation for Weakly-supervised Action Localization
AU - Lee, Pilhyeon
AU - Wang, Jinglu
AU - Lu, Yan
AU - Byun, Hyeran
N1 - Publisher Copyright:
Copyright © 2020, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/6/12
Y1 - 2020/6/12
N2 - Weakly-supervised temporal action localization aims to detect intervals of action instances with only video-level action labels for training. A crucial challenge is to separate frames of action classes from remaining, denoted as background frames (i.e., frames not belonging to any action class). Previous methods attempt background modeling by either synthesizing pseudo background videos with static frames or introducing an auxiliary class for background. However, they overlook an essential fact that background frames could be dynamic and inconsistent. Accordingly, we cast the problem of identifying background frames as out-of-distribution detection and isolate it from conventional action classification. Beyond our base action localization network, we propose a module to estimate the probability of being background (i.e., uncertainty [20]), which allows us to learn uncertainty given only video-level labels via multiple instance learning. A background entropy loss is further designed to reject background frames by forcing them to have uniform probability distribution for action classes. Extensive experiments verify the effectiveness of our background modeling and show that our method significantly outperforms state-of-the-art methods on the standard benchmarks - THUMOS’14 and ActivityNet (1.2 and 1.3). Our code and the trained model are available at https://github.com/Pilhyeon/Background-Modeling-via-Uncertainty-Estimation.
AB - Weakly-supervised temporal action localization aims to detect intervals of action instances with only video-level action labels for training. A crucial challenge is to separate frames of action classes from remaining, denoted as background frames (i.e., frames not belonging to any action class). Previous methods attempt background modeling by either synthesizing pseudo background videos with static frames or introducing an auxiliary class for background. However, they overlook an essential fact that background frames could be dynamic and inconsistent. Accordingly, we cast the problem of identifying background frames as out-of-distribution detection and isolate it from conventional action classification. Beyond our base action localization network, we propose a module to estimate the probability of being background (i.e., uncertainty [20]), which allows us to learn uncertainty given only video-level labels via multiple instance learning. A background entropy loss is further designed to reject background frames by forcing them to have uniform probability distribution for action classes. Extensive experiments verify the effectiveness of our background modeling and show that our method significantly outperforms state-of-the-art methods on the standard benchmarks - THUMOS’14 and ActivityNet (1.2 and 1.3). Our code and the trained model are available at https://github.com/Pilhyeon/Background-Modeling-via-Uncertainty-Estimation.
UR - http://www.scopus.com/inward/record.url?scp=85095208833&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095208833&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85095208833
JO - Review of Economic Dynamics
JF - Review of Economic Dynamics
SN - 1094-2025
ER -