Abstract
The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories. However, it is labor-intensive to temporally annotate audio and visual events and thus hampers the learning of a parsing model. To this end, we propose to explore additional cross-video and cross-modality supervisory signals to facilitate weakly-supervised audio-visual video parsing. The proposed method exploits both the common and diverse event semantics across videos to identify audio or visual events. In addition, our method explores event co-occurrence across audio, visual, and audio-visual streams. We leverage the explored cross-modality co-occurrence to localize segments of target events while excluding irrelevant ones. The discovered supervisory signals across different videos and modalities can greatly facilitate the training with only video-level annotations. Quantitative and qualitative results demonstrate that the proposed method performs favorably against existing methods on weakly-supervised audio-visual video parsing.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
Editors | Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan |
Publisher | Neural information processing systems foundation |
Pages | 11449-11461 |
Number of pages | 13 |
ISBN (Electronic) | 9781713845393 |
Publication status | Published - 2021 |
Event | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online Duration: 2021 Dec 6 → 2021 Dec 14 |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
Volume | 14 |
ISSN (Print) | 1049-5258 |
Conference
Conference | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
---|---|
City | Virtual, Online |
Period | 21/12/6 → 21/12/14 |
Bibliographical note
Funding Information:Acknowledgments. This work was supported in part by the Ministry of Science and Technology under grants 109-2221-E-009-113-MY3, 110-2628-E-A49-008, and 110-2634-F007-015. It was also funded in part by Qualcomm through a Taiwan University Research Collaboration Project, the Higher Education Sprout Project of the National Yang Ming Chiao Tung University, and Ministry of Education.
Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing